Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C
2002-03-01
Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.
Minimizing bias in biomass allometry: Model selection and log transformation of data
Joseph Mascaro; undefined undefined; Flint Hughes; Amanda Uowolo; Stefan A. Schnitzer
2011-01-01
Nonlinear regression is increasingly used to develop allometric equations for forest biomass estimation (i.e., as opposed to the raditional approach of log-transformation followed by linear regression). Most statistical software packages, however, assume additive errors by default, violating a key assumption of allometric theory and possibly producing spurious models....
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES
The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
An empirical model for estimating annual consumption by freshwater fish populations
Liao, H.; Pierce, C.L.; Larscheid, J.G.
2005-01-01
Population consumption is an important process linking predator populations to their prey resources. Simple tools are needed to enable fisheries managers to estimate population consumption. We assembled 74 individual estimates of annual consumption by freshwater fish populations and their mean annual population size, 41 of which also included estimates of mean annual biomass. The data set included 14 freshwater fish species from 10 different bodies of water. From this data set we developed two simple linear regression models predicting annual population consumption. Log-transformed population size explained 94% of the variation in log-transformed annual population consumption. Log-transformed biomass explained 98% of the variation in log-transformed annual population consumption. We quantified the accuracy of our regressions and three alternative consumption models as the mean percent difference from observed (bioenergetics-derived) estimates in a test data set. Predictions from our population-size regression matched observed consumption estimates poorly (mean percent difference = 222%). Predictions from our biomass regression matched observed consumption reasonably well (mean percent difference = 24%). The biomass regression was superior to an alternative model, similar in complexity, and comparable to two alternative models that were more complex and difficult to apply. Our biomass regression model, log10(consumption) = 0.5442 + 0.9962??log10(biomass), will be a useful tool for fishery managers, enabling them to make reasonably accurate annual population consumption predictions from mean annual biomass estimates. ?? Copyright by the American Fisheries Society 2005.
On the equivalence of case-crossover and time series methods in environmental epidemiology.
Lu, Yun; Zeger, Scott L
2007-04-01
The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.
Rosenblum, Michael; van der Laan, Mark J.
2010-01-01
Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM.
Wu, Yichao
2012-01-01
For least squares regression, Efron et al. (2004) proposed an efficient solution path algorithm, the least angle regression (LAR). They showed that a slight modification of the LAR leads to the whole LASSO solution path. Both the LAR and LASSO solution paths are piecewise linear. Recently Wu (2011) extended the LAR to generalized linear models and the quasi-likelihood method. In this work we extend the LAR further to handle Cox's proportional hazards model. The goal is to develop a solution path algorithm for the elastic net penalty (Zou and Hastie (2005)) in Cox's proportional hazards model. This goal is achieved in two steps. First we extend the LAR to optimizing the log partial likelihood plus a fixed small ridge term. Then we define a path modification, which leads to the solution path of the elastic net regularized log partial likelihood. Our solution path is exact and piecewise determined by ordinary differential equation systems.
Sun, Lili; Zhou, Liping; Yu, Yu; Lan, Yukun; Li, Zhiliang
2007-01-01
Polychlorinated diphenyl ethers (PCDEs) have received more and more concerns as a group of ubiquitous potential persistent organic pollutants (POPs). By using molecular electronegativity distance vector (MEDV-4), multiple linear regression (MLR) models are developed for sub-cooled liquid vapor pressures (P(L)), n-octanol/water partition coefficients (K(OW)) and sub-cooled liquid water solubilities (S(W,L)) of 209 PCDEs and diphenyl ether. The correlation coefficients (R) and the leave-one-out cross-validation (LOO) correlation coefficients (R(CV)) of all the 6-descriptor models for logP(L), logK(OW) and logS(W,L) are more than 0.98. By using stepwise multiple regression (SMR), the descriptors are selected and the resulting models are 5-descriptor model for logP(L), 4-descriptor model for logK(OW), and 6-descriptor model for logS(W,L), respectively. All these models exhibit excellent estimate capabilities for internal sample set and good predictive capabilities for external samples set. The consistency between observed and estimated/predicted values for logP(L) is the best (R=0.996, R(CV)=0.996), followed by logK(OW) (R=0.992, R(CV)=0.992) and logS(W,L) (R=0.983, R(CV)=0.980). By using MEDV-4 descriptors, the QSPR models can be used for prediction and the model predictions can hence extend the current database of experimental values.
Three-parameter modeling of the soil sorption of acetanilide and triazine herbicide derivatives.
Freitas, Mirlaine R; Matias, Stella V B G; Macedo, Renato L G; Freitas, Matheus P; Venturin, Nelson
2014-02-01
Herbicides have widely variable toxicity and many of them are persistent soil contaminants. Acetanilide and triazine family of herbicides have widespread use, but increasing interest for the development of new herbicides has been rising to increase their effectiveness and to diminish environmental hazard. The environmental risk of new herbicides can be accessed by estimating their soil sorption (logKoc), which is usually correlated to the octanol/water partition coefficient (logKow). However, earlier findings have shown that this correlation is not valid for some acetanilide and triazine herbicides. Thus, easily accessible quantitative structure-property relationship models are required to predict logKoc of analogues of the these compounds. Octanol/water partition coefficient, molecular weight and volume were calculated and then regressed against logKoc for two series of acetanilide and triazine herbicides using multiple linear regression, resulting in predictive and validated models.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Rothenberg, Stephen J; Rothenberg, Jesse C
2005-09-01
Statistical evaluation of the dose-response function in lead epidemiology is rarely attempted. Economic evaluation of health benefits of lead reduction usually assumes a linear dose-response function, regardless of the outcome measure used. We reanalyzed a previously published study, an international pooled data set combining data from seven prospective lead studies examining contemporaneous blood lead effect on IQ (intelligence quotient) of 7-year-old children (n = 1,333). We constructed alternative linear multiple regression models with linear blood lead terms (linear-linear dose response) and natural-log-transformed blood lead terms (log-linear dose response). We tested the two lead specifications for nonlinearity in the models, compared the two lead specifications for significantly better fit to the data, and examined the effects of possible residual confounding on the functional form of the dose-response relationship. We found that a log-linear lead-IQ relationship was a significantly better fit than was a linear-linear relationship for IQ (p = 0.009), with little evidence of residual confounding of included model variables. We substituted the log-linear lead-IQ effect in a previously published health benefits model and found that the economic savings due to U.S. population lead decrease between 1976 and 1999 (from 17.1 microg/dL to 2.0 microg/dL) was 2.2 times (319 billion dollars) that calculated using a linear-linear dose-response function (149 billion dollars). The Centers for Disease Control and Prevention action limit of 10 microg/dL for children fails to protect against most damage and economic cost attributable to lead exposure.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Demonstration of the Web-based Interspecies Correlation Estimation (Web-ICE) modeling application
The Web-based Interspecies Correlation Estimation (Web-ICE) modeling application is available to the risk assessment community through a user-friendly internet platform (http://epa.gov/ceampubl/fchain/webice/). ICE models are log-linear least square regressions that predict acute...
Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P
2014-06-26
To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
Farsa, Oldřich
2013-01-01
The log BB parameter is the logarithm of the ratio of a compound's equilibrium concentrations in the brain tissue versus the blood plasma. This parameter is a useful descriptor in assessing the ability of a compound to permeate the blood-brain barrier. The aim of this study was to develop a Hansch-type linear regression QSAR model that correlates the parameter log BB and the retention time of drugs and other organic compounds on a reversed-phase HPLC containing an embedded amide moiety. The retention time was expressed by the capacity factor log k'. The second aim was to estimate the brain's absorption of 2-(azacycloalkyl)acetamidophenoxyacetic acids, which are analogues of piracetam, nefiracetam, and meclofenoxate. Notably, these acids may be novel nootropics. Two simple regression models that relate log BB and log k' were developed from an assay performed using a reversed-phase HPLC that contained an embedded amide moiety. Both the quadratic and linear models yielded statistical parameters comparable to previously published models of log BB dependence on various structural characteristics. The models predict that four members of the substituted phenoxyacetic acid series have a strong chance of permeating the barrier and being absorbed in the brain. The results of this study show that a reversed-phase HPLC system containing an embedded amide moiety is a functional in vitro surrogate of the blood-brain barrier. These results suggest that racetam-type nootropic drugs containing a carboxylic moiety could be more poorly absorbed than analogues devoid of the carboxyl group, especially if the compounds penetrate the barrier by a simple diffusion mechanism.
Reliability Analysis of the Gradual Degradation of Semiconductor Devices.
1983-07-20
under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation
ELASTIC NET FOR COX’S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM
Wu, Yichao
2012-01-01
For least squares regression, Efron et al. (2004) proposed an efficient solution path algorithm, the least angle regression (LAR). They showed that a slight modification of the LAR leads to the whole LASSO solution path. Both the LAR and LASSO solution paths are piecewise linear. Recently Wu (2011) extended the LAR to generalized linear models and the quasi-likelihood method. In this work we extend the LAR further to handle Cox’s proportional hazards model. The goal is to develop a solution path algorithm for the elastic net penalty (Zou and Hastie (2005)) in Cox’s proportional hazards model. This goal is achieved in two steps. First we extend the LAR to optimizing the log partial likelihood plus a fixed small ridge term. Then we define a path modification, which leads to the solution path of the elastic net regularized log partial likelihood. Our solution path is exact and piecewise determined by ordinary differential equation systems. PMID:23226932
Using nonlinear quantile regression to estimate the self-thinning boundary curve
Quang V. Cao; Thomas J. Dean
2015-01-01
The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
Farsa, Oldřich
2013-01-01
The log BB parameter is the logarithm of the ratio of a compound’s equilibrium concentrations in the brain tissue versus the blood plasma. This parameter is a useful descriptor in assessing the ability of a compound to permeate the blood-brain barrier. The aim of this study was to develop a Hansch-type linear regression QSAR model that correlates the parameter log BB and the retention time of drugs and other organic compounds on a reversed-phase HPLC containing an embedded amide moiety. The retention time was expressed by the capacity factor log k′. The second aim was to estimate the brain’s absorption of 2-(azacycloalkyl)acetamidophenoxyacetic acids, which are analogues of piracetam, nefiracetam, and meclofenoxate. Notably, these acids may be novel nootropics. Two simple regression models that relate log BB and log k′ were developed from an assay performed using a reversed-phase HPLC that contained an embedded amide moiety. Both the quadratic and linear models yielded statistical parameters comparable to previously published models of log BB dependence on various structural characteristics. The models predict that four members of the substituted phenoxyacetic acid series have a strong chance of permeating the barrier and being absorbed in the brain. The results of this study show that a reversed-phase HPLC system containing an embedded amide moiety is a functional in vitro surrogate of the blood-brain barrier. These results suggest that racetam-type nootropic drugs containing a carboxylic moiety could be more poorly absorbed than analogues devoid of the carboxyl group, especially if the compounds penetrate the barrier by a simple diffusion mechanism. PMID:23641330
Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.
Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W
2005-01-01
Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.
Ma, Wan-Li; Sun, De-Zhi; Shen, Wei-Guo; Yang, Meng; Qi, Hong; Liu, Li-Yan; Shen, Ji-Min; Li, Yi-Fan
2011-07-01
A comprehensive sampling campaign was carried out to study atmospheric concentration of polycyclic aromatic hydrocarbons (PAHs) in Beijing and to evaluate the effectiveness of source control strategies in reducing PAHs pollution after the 29th Olympic Games. The sub-cooled liquid vapor pressure (logP(L)(o))-based model and octanol-air partition coefficient (K(oa))-based model were applied based on each seasonal dateset. Regression analysis among log K(P), logP(L)(o) and log K(oa) exhibited high significant correlations for four seasons. Source factors were identified by principle component analysis and contributions were further estimated by multiple linear regression. Pyrogenic sources and coke oven emission were identified as major sources for both the non-heating and heating seasons. As compared with literatures, the mean PAH concentrations before and after the 29th Olympic Games were reduced by more than 60%, indicating that the source control measures were effective for reducing PAHs pollution in Beijing. Copyright © 2011 Elsevier Ltd. All rights reserved.
Latent log-linear models for handwritten digit classification.
Deselaers, Thomas; Gass, Tobias; Heigold, Georg; Ney, Hermann
2012-06-01
We present latent log-linear models, an extension of log-linear models incorporating latent variables, and we propose two applications thereof: log-linear mixture models and image deformation-aware log-linear models. The resulting models are fully discriminative, can be trained efficiently, and the model complexity can be controlled. Log-linear mixture models offer additional flexibility within the log-linear modeling framework. Unlike previous approaches, the image deformation-aware model directly considers image deformations and allows for a discriminative training of the deformation parameters. Both are trained using alternating optimization. For certain variants, convergence to a stationary point is guaranteed and, in practice, even variants without this guarantee converge and find models that perform well. We tune the methods on the USPS data set and evaluate on the MNIST data set, demonstrating the generalization capabilities of our proposed models. Our models, although using significantly fewer parameters, are able to obtain competitive results with models proposed in the literature.
Golmohammadi, Hassan
2009-11-30
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
Log-normal frailty models fitted as Poisson generalized linear mixed models.
Hirsch, Katharina; Wienke, Andreas; Kuss, Oliver
2016-12-01
The equivalence of a survival model with a piecewise constant baseline hazard function and a Poisson regression model has been known since decades. As shown in recent studies, this equivalence carries over to clustered survival data: A frailty model with a log-normal frailty term can be interpreted and estimated as a generalized linear mixed model with a binary response, a Poisson likelihood, and a specific offset. Proceeding this way, statistical theory and software for generalized linear mixed models are readily available for fitting frailty models. This gain in flexibility comes at the small price of (1) having to fix the number of pieces for the baseline hazard in advance and (2) having to "explode" the data set by the number of pieces. In this paper we extend the simulations of former studies by using a more realistic baseline hazard (Gompertz) and by comparing the model under consideration with competing models. Furthermore, the SAS macro %PCFrailty is introduced to apply the Poisson generalized linear mixed approach to frailty models. The simulations show good results for the shared frailty model. Our new %PCFrailty macro provides proper estimates, especially in case of 4 events per piece. The suggested Poisson generalized linear mixed approach for log-normal frailty models based on the %PCFrailty macro provides several advantages in the analysis of clustered survival data with respect to more flexible modelling of fixed and random effects, exact (in the sense of non-approximate) maximum likelihood estimation, and standard errors and different types of confidence intervals for all variance parameters. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Inflammation, homocysteine and carotid intima-media thickness.
Baptista, Alexandre P; Cacdocar, Sanjiva; Palmeiro, Hugo; Faísca, Marília; Carrasqueira, Herménio; Morgado, Elsa; Sampaio, Sandra; Cabrita, Ana; Silva, Ana Paula; Bernardo, Idalécio; Gome, Veloso; Neves, Pedro L
2008-01-01
Cardiovascular disease is the main cause of morbidity and mortality in chronic renal patients. Carotid intima-media thickness (CIMT) is one of the most accurate markers of atherosclerosis risk. In this study, the authors set out to evaluate a population of chronic renal patients to determine which factors are associated with an increase in intima-media thickness. We included 56 patients (F=22, M=34), with a mean age of 68.6 years, and an estimated glomerular filtration rate of 15.8 ml/min (calculated by the MDRD equation). Various laboratory and inflammatory parameters (hsCRP, IL-6 and TNF-alpha) were evaluated. All subjects underwent measurement of internal carotid artery intima-media thickness by high-resolution real-time B-mode ultrasonography using a 10 MHz linear transducer. Intima-media thickness was used as a dependent variable in a simple linear regression model, with the various laboratory parameters as independent variables. Only parameters showing a significant correlation with CIMT were evaluated in a multiple regression model: age (p=0.001), hemoglobin (p=00.3), logCRP (p=0.042), logIL-6 (p=0.004) and homocysteine (p=0.002). In the multiple regression model we found that age (p=0.001) and homocysteine (p=0.027) were independently correlated with CIMT. LogIL-6 did not reach statistical significance (p=0.057), probably due to the small population size. The authors conclude that age and homocysteine correlate with carotid intima-media thickness, and thus can be considered as markers/risk factors in chronic renal patients.
Vucicevic, J; Popovic, M; Nikolic, K; Filipic, S; Obradovic, D; Agbaba, D
2017-03-01
For this study, 31 compounds, including 16 imidazoline/α-adrenergic receptor (IRs/α-ARs) ligands and 15 central nervous system (CNS) drugs, were characterized in terms of the retention factors (k) obtained using biopartitioning micellar and classical reversed phase chromatography (log k BMC and log k wRP , respectively). Based on the retention factor (log k wRP ) and slope of the linear curve (S) the isocratic parameter (φ 0 ) was calculated. Obtained retention factors were correlated with experimental log BB values for the group of examined compounds. High correlations were obtained between logarithm of biopartitioning micellar chromatography (BMC) retention factor and effective permeability (r(log k BMC /log BB): 0.77), while for RP-HPLC system the correlations were lower (r(log k wRP /log BB): 0.58; r(S/log BB): -0.50; r(φ 0 /P e ): 0.61). Based on the log k BMC retention data and calculated molecular parameters of the examined compounds, quantitative structure-permeability relationship (QSPR) models were developed using partial least squares, stepwise multiple linear regression, support vector machine and artificial neural network methodologies. A high degree of structural diversity of the analysed IRs/α-ARs ligands and CNS drugs provides wide applicability domain of the QSPR models for estimation of blood-brain barrier penetration of the related compounds.
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
A Comparison of Strategies for Estimating Conditional DIF
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil J.
2010-01-01
In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.
Austin, Peter C
2017-08-01
Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).
Rothenberg, Stephen J.; Rothenberg, Jesse C.
2005-01-01
Statistical evaluation of the dose–response function in lead epidemiology is rarely attempted. Economic evaluation of health benefits of lead reduction usually assumes a linear dose–response function, regardless of the outcome measure used. We reanalyzed a previously published study, an international pooled data set combining data from seven prospective lead studies examining contemporaneous blood lead effect on IQ (intelligence quotient) of 7-year-old children (n = 1,333). We constructed alternative linear multiple regression models with linear blood lead terms (linear–linear dose response) and natural-log–transformed blood lead terms (log-linear dose response). We tested the two lead specifications for nonlinearity in the models, compared the two lead specifications for significantly better fit to the data, and examined the effects of possible residual confounding on the functional form of the dose–response relationship. We found that a log-linear lead–IQ relationship was a significantly better fit than was a linear–linear relationship for IQ (p = 0.009), with little evidence of residual confounding of included model variables. We substituted the log-linear lead–IQ effect in a previously published health benefits model and found that the economic savings due to U.S. population lead decrease between 1976 and 1999 (from 17.1 μg/dL to 2.0 μg/dL) was 2.2 times ($319 billion) that calculated using a linear–linear dose–response function ($149 billion). The Centers for Disease Control and Prevention action limit of 10 μg/dL for children fails to protect against most damage and economic cost attributable to lead exposure. PMID:16140626
A FORTRAN program for multivariate survival analysis on the personal computer.
Mulder, P G
1988-01-01
In this paper a FORTRAN program is presented for multivariate survival or life table regression analysis in a competing risks' situation. The relevant failure rate (for example, a particular disease or mortality rate) is modelled as a log-linear function of a vector of (possibly time-dependent) explanatory variables. The explanatory variables may also include the variable time itself, which is useful for parameterizing piecewise exponential time-to-failure distributions in a Gompertz-like or Weibull-like way as a more efficient alternative to Cox's proportional hazards model. Maximum likelihood estimates of the coefficients of the log-linear relationship are obtained from the iterative Newton-Raphson method. The program runs on a personal computer under DOS; running time is quite acceptable, even for large samples.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws.
Xiao, Xiao; White, Ethan P; Hooten, Mevin B; Durham, Susan L
2011-10-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain.
Yu, S; Gao, S; Gan, Y; Zhang, Y; Ruan, X; Wang, Y; Yang, L; Shi, J
2016-04-01
Quantitative structure-property relationship modelling can be a valuable alternative method to replace or reduce experimental testing. In particular, some endpoints such as octanol-water (KOW) and organic carbon-water (KOC) partition coefficients of polychlorinated biphenyls (PCBs) are easier to predict and various models have been already developed. In this paper, two different methods, which are multiple linear regression based on the descriptors generated using Dragon software and hologram quantitative structure-activity relationships, were employed to predict suspended particulate matter (SPM) derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of 209 PCBs. The predictive ability of the derived models was validated using a test set. The performances of all these models were compared with EPI Suite™ software. The results indicated that the proposed models were robust and satisfactory, and could provide feasible and promising tools for the rapid assessment of the SPM derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of PCBs.
Linearly Supporting Feature Extraction for Automated Estimation of Stellar Atmospheric Parameters
NASA Astrophysics Data System (ADS)
Li, Xiangru; Lu, Yu; Comte, Georges; Luo, Ali; Zhao, Yongheng; Wang, Yongjun
2015-05-01
We describe a scheme to extract linearly supporting (LSU) features from stellar spectra to automatically estimate the atmospheric parameters {{T}{\\tt{eff} }}, log g, and [Fe/H]. “Linearly supporting” means that the atmospheric parameters can be accurately estimated from the extracted features through a linear model. The successive steps of the process are as follow: first, decompose the spectrum using a wavelet packet (WP) and represent it by the derived decomposition coefficients; second, detect representative spectral features from the decomposition coefficients using the proposed method Least Absolute Shrinkage and Selection Operator (LARS)bs; third, estimate the atmospheric parameters {{T}{\\tt{eff} }}, log g, and [Fe/H] from the detected features using a linear regression method. One prominent characteristic of this scheme is its ability to evaluate quantitatively the contribution of each detected feature to the atmospheric parameter estimate and also to trace back the physical significance of that feature. This work also shows that the usefulness of a component depends on both the wavelength and frequency. The proposed scheme has been evaluated on both real spectra from the Sloan Digital Sky Survey (SDSS)/SEGUE and synthetic spectra calculated from Kurucz's NEWODF models. On real spectra, we extracted 23 features to estimate {{T}{\\tt{eff} }}, 62 features for log g, and 68 features for [Fe/H]. Test consistencies between our estimates and those provided by the Spectroscopic Parameter Pipeline of SDSS show that the mean absolute errors (MAEs) are 0.0062 dex for log {{T}{\\tt{eff} }} (83 K for {{T}{\\tt{eff} }}), 0.2345 dex for log g, and 0.1564 dex for [Fe/H]. For the synthetic spectra, the MAE test accuracies are 0.0022 dex for log {{T}{\\tt{eff} }} (32 K for {{T}{\\tt{eff} }}), 0.0337 dex for log g, and 0.0268 dex for [Fe/H].
Wockner, Leesa F; Hoffmann, Isabell; O'Rourke, Peter; McCarthy, James S; Marquart, Louise
2017-08-25
The efficacy of vaccines aimed at inhibiting the growth of malaria parasites in the blood can be assessed by comparing the growth rate of parasitaemia in the blood of subjects treated with a test vaccine compared to controls. In studies using induced blood stage malaria (IBSM), a type of controlled human malaria infection, parasite growth rate has been measured using models with the intercept on the y-axis fixed to the inoculum size. A set of statistical models was evaluated to determine an optimal methodology to estimate parasite growth rate in IBSM studies. Parasite growth rates were estimated using data from 40 subjects published in three IBSM studies. Data was fitted using 12 statistical models: log-linear, sine-wave with the period either fixed to 48 h or not fixed; these models were fitted with the intercept either fixed to the inoculum size or not fixed. All models were fitted by individual, and overall by study using a mixed effects model with a random effect for the individual. Log-linear models and sine-wave models, with the period fixed or not fixed, resulted in similar parasite growth rate estimates (within 0.05 log 10 parasites per mL/day). Average parasite growth rate estimates for models fitted by individual with the intercept fixed to the inoculum size were substantially lower by an average of 0.17 log 10 parasites per mL/day (range 0.06-0.24) compared with non-fixed intercept models. Variability of parasite growth rate estimates across the three studies analysed was substantially higher (3.5 times) for fixed-intercept models compared with non-fixed intercept models. The same tendency was observed in models fitted overall by study. Modelling data by individual or overall by study had minimal effect on parasite growth estimates. The analyses presented in this report confirm that fixing the intercept to the inoculum size influences parasite growth estimates. The most appropriate statistical model to estimate the growth rate of blood-stage parasites in IBSM studies appears to be a log-linear model fitted by individual and with the intercept estimated in the log-linear regression. Future studies should use this model to estimate parasite growth rates.
ERIC Educational Resources Information Center
Denham, Bryan E.
2009-01-01
Grounded conceptually in social cognitive theory, this research examines how personal, behavioral, and environmental factors are associated with risk perceptions of anabolic-androgenic steroids. Ordinal logistic regression and logit log-linear models applied to data gathered from high-school seniors (N = 2,160) in the 2005 Monitoring the Future…
An approach to checking case-crossover analyses based on equivalence with time-series methods.
Lu, Yun; Symons, James Morel; Geyh, Alison S; Zeger, Scott L
2008-03-01
The case-crossover design has been increasingly applied to epidemiologic investigations of acute adverse health effects associated with ambient air pollution. The correspondence of the design to that of matched case-control studies makes it inferentially appealing for epidemiologic studies. Case-crossover analyses generally use conditional logistic regression modeling. This technique is equivalent to time-series log-linear regression models when there is a common exposure across individuals, as in air pollution studies. Previous methods for obtaining unbiased estimates for case-crossover analyses have assumed that time-varying risk factors are constant within reference windows. In this paper, we rely on the connection between case-crossover and time-series methods to illustrate model-checking procedures from log-linear model diagnostics for time-stratified case-crossover analyses. Additionally, we compare the relative performance of the time-stratified case-crossover approach to time-series methods under 3 simulated scenarios representing different temporal patterns of daily mortality associated with air pollution in Chicago, Illinois, during 1995 and 1996. Whenever a model-be it time-series or case-crossover-fails to account appropriately for fluctuations in time that confound the exposure, the effect estimate will be biased. It is therefore important to perform model-checking in time-stratified case-crossover analyses rather than assume the estimator is unbiased.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
Moran, John L; Solomon, Patricia J
2012-05-16
For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable. Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008-2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator. The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function. For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
Estimation of octanol/water partition coefficients using LSER parameters
Luehrs, Dean C.; Hickey, James P.; Godbole, Kalpana A.; Rogers, Tony N.
1998-01-01
The logarithms of octanol/water partition coefficients, logKow, were regressed against the linear solvation energy relationship (LSER) parameters for a training set of 981 diverse organic chemicals. The standard deviation for logKow was 0.49. The regression equation was then used to estimate logKow for a test of 146 chemicals which included pesticides and other diverse polyfunctional compounds. Thus the octanol/water partition coefficient may be estimated by LSER parameters without elaborate software but only moderate accuracy should be expected.
Asquith, William H.; Thompson, David B.
2008-01-01
The U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, investigated a refinement of the regional regression method and developed alternative equations for estimation of peak-streamflow frequency for undeveloped watersheds in Texas. A common model for estimation of peak-streamflow frequency is based on the regional regression method. The current (2008) regional regression equations for 11 regions of Texas are based on log10 transformations of all regression variables (drainage area, main-channel slope, and watershed shape). Exclusive use of log10-transformation does not fully linearize the relations between the variables. As a result, some systematic bias remains in the current equations. The bias results in overestimation of peak streamflow for both the smallest and largest watersheds. The bias increases with increasing recurrence interval. The primary source of the bias is the discernible curvilinear relation in log10 space between peak streamflow and drainage area. Bias is demonstrated by selected residual plots with superimposed LOWESS trend lines. To address the bias, a statistical framework based on minimization of the PRESS statistic through power transformation of drainage area is described and implemented, and the resulting regression equations are reported. Compared to log10-exclusive equations, the equations derived from PRESS minimization have PRESS statistics and residual standard errors less than the log10 exclusive equations. Selected residual plots for the PRESS-minimized equations are presented to demonstrate that systematic bias in regional regression equations for peak-streamflow frequency estimation in Texas can be reduced. Because the overall error is similar to the error associated with previous equations and because the bias is reduced, the PRESS-minimized equations reported here provide alternative equations for peak-streamflow frequency estimation.
Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel
2016-10-01
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.
Defining a Family of Cognitive Diagnosis Models Using Log-Linear Models with Latent Variables
ERIC Educational Resources Information Center
Henson, Robert A.; Templin, Jonathan L.; Willse, John T.
2009-01-01
This paper uses log-linear models with latent variables (Hagenaars, in "Loglinear Models with Latent Variables," 1993) to define a family of cognitive diagnosis models. In doing so, the relationship between many common models is explicitly defined and discussed. In addition, because the log-linear model with latent variables is a general model for…
ERIC Educational Resources Information Center
Si, Yajuan; Reiter, Jerome P.
2013-01-01
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…
On the null distribution of Bayes factors in linear regression
USDA-ARS?s Scientific Manuscript database
We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
A hierarchical model for estimating change in American Woodcock populations
Sauer, J.R.; Link, W.A.; Kendall, W.L.; Kelley, J.R.; Niven, D.K.
2008-01-01
The Singing-Ground Survey (SGS) is a primary source of information on population change for American woodcock (Scolopax minor). We analyzed the SGS using a hierarchical log-linear model and compared the estimates of change and annual indices of abundance to a route regression analysis of SGS data. We also grouped SGS routes into Bird Conservation Regions (BCRs) and estimated population change and annual indices using BCRs within states and provinces as strata. Based on the hierarchical model?based estimates, we concluded that woodcock populations were declining in North America between 1968 and 2006 (trend = -0.9%/yr, 95% credible interval: -1.2, -0.5). Singing-Ground Survey results are generally similar between analytical approaches, but the hierarchical model has several important advantages over the route regression. Hierarchical models better accommodate changes in survey efficiency over time and space by treating strata, years, and observers as random effects in the context of a log-linear model, providing trend estimates that are derived directly from the annual indices. We also conducted a hierarchical model analysis of woodcock data from the Christmas Bird Count and the North American Breeding Bird Survey. All surveys showed general consistency in patterns of population change, but the SGS had the shortest credible intervals. We suggest that population management and conservation planning for woodcock involving interpretation of the SGS use estimates provided by the hierarchical model.
A kinetic energy model of two-vehicle crash injury severity.
Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh
2011-05-01
An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws
Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L.
2011-01-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain. ?? 2011 by the Ecological Society of America.
ERIC Educational Resources Information Center
Xu, Xueli; von Davier, Matthias
2008-01-01
The general diagnostic model (GDM) utilizes located latent classes for modeling a multidimensional proficiency variable. In this paper, the GDM is extended by employing a log-linear model for multiple populations that assumes constraints on parameters across multiple groups. This constrained model is compared to log-linear models that assume…
Al-Chalabi, Ammar; Calvo, Andrea; Chio, Adriano; Colville, Shuna; Ellis, Cathy M; Hardiman, Orla; Heverin, Mark; Howard, Robin S; Huisman, Mark H B; Keren, Noa; Leigh, P Nigel; Mazzini, Letizia; Mora, Gabriele; Orrell, Richard W; Rooney, James; Scott, Kirsten M; Scotton, William J; Seelen, Meinie; Shaw, Christopher E; Sidle, Katie S; Swingler, Robert; Tsuda, Miho; Veldink, Jan H; Visser, Anne E; van den Berg, Leonard H; Pearce, Neil
2014-11-01
Amyotrophic lateral sclerosis shares characteristics with some cancers, such as onset being more common in later life, progression usually being rapid, the disease affecting a particular cell type, and showing complex inheritance. We used a model originally applied to cancer epidemiology to investigate the hypothesis that amyotrophic lateral sclerosis is a multistep process. We generated incidence data by age and sex from amyotrophic lateral sclerosis population registers in Ireland (registration dates 1995-2012), the Netherlands (2006-12), Italy (1995-2004), Scotland (1989-98), and England (2002-09), and calculated age and sex-adjusted incidences for each register. We regressed the log of age-specific incidence against the log of age with least squares regression. We did the analyses within each register, and also did a combined analysis, adjusting for register. We identified 6274 cases of amyotrophic lateral sclerosis from a catchment population of about 34 million people. We noted a linear relationship between log incidence and log age in all five registers: England r(2)=0·95, Ireland r(2)=0·99, Italy r(2)=0·95, the Netherlands r(2)=0·99, and Scotland r(2)=0·97; overall r(2)=0·99. All five registers gave similar estimates of the linear slope ranging from 4·5 to 5·1, with overlapping confidence intervals. The combination of all five registers gave an overall slope of 4·8 (95% CI 4·5-5·0), with similar estimates for men (4·6, 4·3-4·9) and women (5·0, 4·5-5·5). A linear relationship between the log incidence and log age of onset of amyotrophic lateral sclerosis is consistent with a multistage model of disease. The slope estimate suggests that amyotrophic lateral sclerosis is a six-step process. Identification of these steps could lead to preventive and therapeutic avenues. UK Medical Research Council; UK Economic and Social Research Council; Ireland Health Research Board; The Netherlands Organisation for Health Research and Development (ZonMw); the Ministry of Health and Ministry of Education, University, and Research in Italy; the Motor Neurone Disease Association of England, Wales, and Northern Ireland; and the European Commission (Seventh Framework Programme). Copyright © 2014 Elsevier Ltd. All rights reserved.
Decomposition and model selection for large contingency tables.
Dahinden, Corinne; Kalisch, Markus; Bühlmann, Peter
2010-04-01
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Lamm, Steven H.; Ferdosi, Hamid; Dissen, Elisabeth K.; Li, Ji; Ahn, Jaeil
2015-01-01
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1–1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100–150 µg/L arsenic. PMID:26690190
New method for calculating a mathematical expression for streamflow recession
Rutledge, Albert T.
1991-01-01
An empirical method has been devised to calculate the master recession curve, which is a mathematical expression for streamflow recession during times of negligible direct runoff. The method is based on the assumption that the storage-delay factor, which is the time per log cycle of streamflow recession, varies linearly with the logarithm of streamflow. The resulting master recession curve can be nonlinear. The method can be executed by a computer program that reads a data file of daily mean streamflow, then allows the user to select several near-linear segments of streamflow recession. The storage-delay factor for each segment is one of the coefficients of the equation that results from linear least-squares regression. Using results for each recession segment, a mathematical expression of the storage-delay factor as a function of the log of streamflow is determined by linear least-squares regression. The master recession curve, which is a second-order polynomial expression for time as a function of log of streamflow, is then derived using the coefficients of this function.
Quantum algorithm for linear regression
NASA Astrophysics Data System (ADS)
Wang, Guoming
2017-07-01
We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.
NASA Astrophysics Data System (ADS)
Paul, Suman; Ali, Muhammad; Chatterjee, Rima
2018-01-01
Velocity of compressional wave ( V P) of coal and non-coal lithology is predicted from five wells from the Bokaro coalfield (CF), India. Shear sonic travel time logs are not recorded for all wells under the study area. Shear wave velocity ( Vs) is available only for two wells: one from east and other from west Bokaro CF. The major lithologies of this CF are dominated by coal, shaly coal of Barakar formation. This paper focuses on the (a) relationship between Vp and Vs, (b) prediction of Vp using regression and neural network modeling and (c) estimation of maximum horizontal stress from image log. Coal characterizes with low acoustic impedance (AI) as compared to the overlying and underlying strata. The cross-plot between AI and Vp/ Vs is able to identify coal, shaly coal, shale and sandstone from wells in Bokaro CF. The relationship between Vp and Vs is obtained with excellent goodness of fit ( R 2) ranging from 0.90 to 0.93. Linear multiple regression and multi-layered feed-forward neural network (MLFN) models are developed for prediction Vp from two wells using four input log parameters: gamma ray, resistivity, bulk density and neutron porosity. Regression model predicted Vp shows poor fit (from R 2 = 0.28) to good fit ( R 2 = 0.79) with the observed velocity. MLFN model predicted Vp indicates satisfactory to good R2 values varying from 0.62 to 0.92 with the observed velocity. Maximum horizontal stress orientation from a well at west Bokaro CF is studied from Formation Micro-Imager (FMI) log. Breakouts and drilling-induced fractures (DIFs) are identified from the FMI log. Breakout length of 4.5 m is oriented towards N60°W whereas the orientation of DIFs for a cumulative length of 26.5 m is varying from N15°E to N35°E. The mean maximum horizontal stress in this CF is towards N28°E.
Flow-covariate prediction of stream pesticide concentrations.
Mosquin, Paul L; Aldworth, Jeremy; Chen, Wenlin
2018-01-01
Potential peak functions (e.g., maximum rolling averages over a given duration) of annual pesticide concentrations in the aquatic environment are important exposure parameters (or target quantities) for ecological risk assessments. These target quantities require accurate concentration estimates on nonsampled days in a monitoring program. We examined stream flow as a covariate via universal kriging to improve predictions of maximum m-day (m = 1, 7, 14, 30, 60) rolling averages and the 95th percentiles of atrazine concentration in streams where data were collected every 7 or 14 d. The universal kriging predictions were evaluated against the target quantities calculated directly from the daily (or near daily) measured atrazine concentration at 32 sites (89 site-yr) as part of the Atrazine Ecological Monitoring Program in the US corn belt region (2008-2013) and 4 sites (62 site-yr) in Ohio by the National Center for Water Quality Research (1993-2008). Because stream flow data are strongly skewed to the right, 3 transformations of the flow covariate were considered: log transformation, short-term flow anomaly, and normalized Box-Cox transformation. The normalized Box-Cox transformation resulted in predictions of the target quantities that were comparable to those obtained from log-linear interpolation (i.e., linear interpolation on the log scale) for 7-d sampling. However, the predictions appeared to be negatively affected by variability in regression coefficient estimates across different sample realizations of the concentration time series. Therefore, revised models incorporating seasonal covariates and partially or fully constrained regression parameters were investigated, and they were found to provide much improved predictions in comparison with those from log-linear interpolation for all rolling average measures. Environ Toxicol Chem 2018;37:260-273. © 2017 SETAC. © 2017 SETAC.
USDA-ARS?s Scientific Manuscript database
Using linear regression models, we studied the main and two-way interaction effects of the predictor variables gender, age, BMI, and 64 folate/vitamin B-12/homocysteine/lipid/cholesterol-related single nucleotide polymorphisms (SNP) on log-transformed plasma homocysteine normalized by red blood cell...
Detecting trends in raptor counts: power and type I error rates of various statistical tests
Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.
1996-01-01
We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.
Wu, Zilan; Lin, Tian; Li, Zhongxia; Jiang, Yuqing; Li, Yuanyuan; Yao, Xiaohong; Gao, Huiwang; Guo, Zhigang
2017-11-01
We measured 15 parent polycyclic aromatic hydrocarbons (PAHs) in atmosphere and water during a research cruise from the East China Sea (ECS) to the northwestern Pacific Ocean (NWP) in the spring of 2015 to investigate the occurrence, air-sea gas exchange, and gas-particle partitioning of PAHs with a particular focus on the influence of East Asian continental outflow. The gaseous PAH composition and identification of sources were consistent with PAHs from the upwind area, indicating that the gaseous PAHs (three-to five-ring PAHs) were influenced by upwind land pollution. In addition, air-sea exchange fluxes of gaseous PAHs were estimated to be -54.2-107.4 ng m -2 d -1 , and was indicative of variations of land-based PAH inputs. The logarithmic gas-particle partition coefficient (logK p ) of PAHs regressed linearly against the logarithmic subcooled liquid vapor pressure (logP L 0 ), with a slope of -0.25. This was significantly larger than the theoretical value (-1), implying disequilibrium between the gaseous and particulate PAHs over the NWP. The non-equilibrium of PAH gas-particle partitioning was shielded from the volatilization of three-ring gaseous PAHs from seawater and lower soot concentrations in particular when the oceanic air masses prevailed. Modeling PAH absorption into organic matter and adsorption onto soot carbon revealed that the status of PAH gas-particle partitioning deviated more from the modeling K p for oceanic air masses than those for continental air masses, which coincided with higher volatilization of three-ring PAHs and confirmed the influence of air-sea exchange. Meanwhile, significant linear regressions between logK p and logK oa (logK sa ) for PAHs were observed for continental air masses, suggesting the dominant effect of East Asian continental outflow on atmospheric PAHs over the NWP during the sampling campaign. Copyright © 2017 Elsevier Ltd. All rights reserved.
Athanasopoulos, Leonidas V; Dritsas, Athanasios; Doll, Helen A; Cokkinos, Dennis V
2010-08-01
This study was conducted to explain the variance in quality of life (QoL) and activity capacity of patients with congestive heart failure from pathophysiological changes as estimated by laboratory data. Peak oxygen consumption (peak VO2) and ventilation (VE)/carbon dioxide output (VCO2) slope derived from cardiopulmonary exercise testing, plasma N-terminal prohormone of B-type natriuretic peptide (NT-proBNP), and echocardiographic markers [left atrium (LA), left ventricular ejection fraction (LVEF)] were measured in 62 patients with congestive heart failure, who also completed the Minnesota Living with Heart Failure Questionnaire and the Specific Activity Questionnaire. All regression models were adjusted for age and sex. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.01, LVEF with P value less than 0.001, LA with P=0.001, and logNT-proBNP with P value less than 0.01 were found to be associated with QoL. On stepwise multiple linear regression, peak VO2 and LVEF continued to be predictive, accounting for 40% of the variability in Minnesota Living with Heart Failure Questionnaire score. On linear regression analysis, peak VO2 with P value less than 0.001, VE/VCO2 slope with P value less than 0.001, LVEF with P value less than 0.05, LA with P value less than 0.001, and logNT-proBNP with P value less than 0.001 were found to be associated with activity capacity. On stepwise multiple linear regression, peak VO2 and LA continued to be predictive, accounting for 53% of the variability in Specific Activity Questionnaire score. Peak VO2 is independently associated both with QoL and activity capacity. In addition to peak VO2, LVEF is independently associated with QoL, and LA with activity capacity.
Smooth individual level covariates adjustment in disease mapping.
Huque, Md Hamidul; Anderson, Craig; Walton, Richard; Woolford, Samuel; Ryan, Louise
2018-05-01
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available "indiCAR" model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log-linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non-log-linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth-indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two-step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth-indiCAR through simulation. Our results indicate that the smooth-indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Evaluation of third-degree and fourth-degree laceration rates as quality indicators.
Friedman, Alexander M; Ananth, Cande V; Prendergast, Eri; D'Alton, Mary E; Wright, Jason D
2015-04-01
To examine the patterns and predictors of third-degree and fourth-degree laceration in women undergoing vaginal delivery. We identified a population-based cohort of women in the United States who underwent a vaginal delivery between 1998 and 2010 using the Nationwide Inpatient Sample. Multivariable log-linear regression models were developed to account for patient, obstetric, and hospital factors related to lacerations. Between-hospital variability of laceration rates was calculated using generalized log-linear mixed models. Among 7,096,056 women who underwent vaginal delivery in 3,070 hospitals, 3.3% (n=232,762) had a third-degree laceration and 1.1% (n=76,347) had a fourth-degree laceration. In an adjusted model for fourth-degree lacerations, important risk factors included shoulder dystocia and forceps and vacuum deliveries with and without episiotomy. Other demographic, obstetric, medical, and hospital variables, although statistically significant, were not major determinants of lacerations. Risk factors in a multivariable model for third-degree lacerations were similar to those in the fourth-degree model. Regression analysis of hospital rates (n=3,070) of lacerations demonstrated limited between-hospital variation. Risk of third-degree and fourth-degree laceration was most strongly related to operative delivery and shoulder dystocia. Between-hospital variation was limited. Given these findings and that the most modifiable practice related to lacerations would be reduction in operative vaginal deliveries (and a possible increase in cesarean delivery), third-degree and fourth-degree laceration rates may be a quality metric of limited utility.
Rampersaud, E; Morris, R W; Weinberg, C R; Speer, M C; Martin, E R
2007-01-01
Genotype-based likelihood-ratio tests (LRT) of association that examine maternal and parent-of-origin effects have been previously developed in the framework of log-linear and conditional logistic regression models. In the situation where parental genotypes are missing, the expectation-maximization (EM) algorithm has been incorporated in the log-linear approach to allow incomplete triads to contribute to the LRT. We present an extension to this model which we call the Combined_LRT that incorporates additional information from the genotypes of unaffected siblings to improve assignment of incompletely typed families to mating type categories, thereby improving inference of missing parental data. Using simulations involving a realistic array of family structures, we demonstrate the validity of the Combined_LRT under the null hypothesis of no association and provide power comparisons under varying levels of missing data and using sibling genotype data. We demonstrate the improved power of the Combined_LRT compared with the family-based association test (FBAT), another widely used association test. Lastly, we apply the Combined_LRT to a candidate gene analysis in Autism families, some of which have missing parental genotypes. We conclude that the proposed log-linear model will be an important tool for future candidate gene studies, for many complex diseases where unaffected siblings can often be ascertained and where epigenetic factors such as imprinting may play a role in disease etiology.
TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS
Johndrow, James E.; Bhattacharya, Anirban; Dunson, David B.
2017-01-01
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions. PMID:29332971
Multivariate regression model for predicting yields of grade lumber from yellow birch sawlogs
Andrew F. Howard; Daniel A. Yaussy
1986-01-01
A multivariate regression model was developed to predict green board-foot yields for the common grades of factory lumber processed from yellow birch factory-grade logs. The model incorporates the standard log measurements of scaling diameter, length, proportion of scalable defects, and the assigned USDA Forest Service log grade. Differences in yields between band and...
Janik, Leslie J; Forrester, Sean T; Soriano-Disla, José M; Kirby, Jason K; McLaughlin, Michael J; Reimann, Clemens
2015-02-01
The authors' aim was to develop rapid and inexpensive regression models for the prediction of partitioning coefficients (Kd), defined as the ratio of the total or surface-bound metal/metalloid concentration of the solid phase to the total concentration in the solution phase. Values of Kd were measured for boric acid (B[OH]3(0)) and selected added soluble oxoanions: molybdate (MoO4(2-)), antimonate (Sb[OH](6-)), selenate (SeO4(2-)), tellurate (TeO4(2-)) and vanadate (VO4(3-)). Models were developed using approximately 500 spectrally representative soils of the Geochemical Mapping of Agricultural Soils of Europe (GEMAS) program. These calibration soils represented the major properties of the entire 4813 soils of the GEMAS project. Multiple linear regression (MLR) from soil properties, partial least-squares regression (PLSR) using mid-infrared diffuse reflectance Fourier-transformed (DRIFT) spectra, and models using DRIFT spectra plus analytical pH values (DRIFT + pH), were compared with predicted log K(d + 1) values. Apart from selenate (R(2) = 0.43), the DRIFT + pH calibrations resulted in marginally better models to predict log K(d + 1) values (R(2) = 0.62-0.79), compared with those from PSLR-DRIFT (R(2) = 0.61-0.72) and MLR (R(2) = 0.54-0.79). The DRIFT + pH calibrations were applied to the prediction of log K(d + 1) values in the remaining 4313 soils. An example map of predicted log K(d + 1) values for added soluble MoO4(2-) in soils across Europe is presented. The DRIFT + pH PLSR models provided a rapid and inexpensive tool to assess the risk of mobility and potential availability of boric acid and selected oxoanions in European soils. For these models to be used in the prediction of log K(d + 1) values in soils globally, additional research will be needed to determine if soil variability is accounted on the calibration. © 2014 SETAC.
van Os-Medendorp, Harmieke; van Leent-de Wit, Ilse; de Bruin-Weller, Marjolein; Knulst, André
2015-05-23
Two online self-management programs for patients with atopic dermatitis (AD) or food allergy (FA) were developed with the aim of helping patients cope with their condition, follow the prescribed treatment regimen, and deal with the consequences of their illness in daily life. Both programs consist of several modules containing information, personal stories by fellow patients, videos, and exercises with feedback. Health care professionals can refer their patients to the programs. However, the use of the program in daily practice is unknown. The aim of this study was to explore the use and characteristics of users of the online self-management programs "Living with eczema," and "Living with food allergy," and to investigate factors related to the use of the trainings. A cross-sectional design was carried out in which the outcome parameters were the number of log-ins by patients, the number of hits on the system's core features, disease severity, quality of life, and domains of self-management. Descriptive statistics were used to summarize sample characteristics and to describe number of log-ins and hits per module and per functionality. Correlation and regression analyses were used to explore the relation between the number of log-ins and patient characteristics. Since the start, 299 adult patients have been referred to the online AD program; 173 logged in for at least one occasion. Data from 75 AD patients were available for analyses. Mean number of log-ins was 3.1 (range 1-11). Linear regression with the number of log-ins as dependent variable showed that age and quality of life contributed most to the model, with betas of .35 ( P=.002) and .26 (P=.05), respectively, and an R(2) of .23. Two hundred fourteen adult FA patients were referred to the online FA training, 124 logged in for at least one occasion and data from 45 patients were available for analysis. Mean number of log-ins was 3.0 (range 1-11). Linear regression with the number of log-ins as dependent variable revealed that adding the self-management domain "social integration and support" to the model led to an R(2) of .13. The modules with information about the disease, diagnosis, and treatment were most visited. Most hits were on the information parts of the modules (55-58%), followed by exercises (30-32%). The online self-management programs "Living with eczema" and "Living with food allergy" were used by patients in addition to the usual face-to-face care. Almost 60% of all referred patients logged in, with an average of three log-ins. All modules seemed to be relevant, but there is room for improvement in the use of the training. Age, quality of life, and lower social integration and support were related to the use of the training, but only part of the variance in use could be explained by these variables.
Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Jiang, Dingfeng; Huang, Jian
2013-01-01
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. PMID:25309048
Accounting for measurement error in log regression models with applications to accelerated testing.
Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M
2018-01-01
In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Comparison of Survival Models for Analyzing Prognostic Factors in Gastric Cancer Patients
Habibi, Danial; Rafiei, Mohammad; Chehrei, Ali; Shayan, Zahra; Tafaqodi, Soheil
2018-03-27
Objective: There are a number of models for determining risk factors for survival of patients with gastric cancer. This study was conducted to select the model showing the best fit with available data. Methods: Cox regression and parametric models (Exponential, Weibull, Gompertz, Log normal, Log logistic and Generalized Gamma) were utilized in unadjusted and adjusted forms to detect factors influencing mortality of patients. Comparisons were made with Akaike Information Criterion (AIC) by using STATA 13 and R 3.1.3 softwares. Results: The results of this study indicated that all parametric models outperform the Cox regression model. The Log normal, Log logistic and Generalized Gamma provided the best performance in terms of AIC values (179.2, 179.4 and 181.1, respectively). On unadjusted analysis, the results of the Cox regression and parametric models indicated stage, grade, largest diameter of metastatic nest, largest diameter of LM, number of involved lymph nodes and the largest ratio of metastatic nests to lymph nodes, to be variables influencing the survival of patients with gastric cancer. On adjusted analysis, according to the best model (log normal), grade was found as the significant variable. Conclusion: The results suggested that all parametric models outperform the Cox model. The log normal model provides the best fit and is a good substitute for Cox regression. Creative Commons Attribution License
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipfert, F.W.
1992-11-01
1980 data from up to 149 metropolitan areas were used to define cross-sectional associations between community air pollution and excess human mortality. The regression model proposed by Oezkaynak and Thurston, which accounted for age, race, education, poverty, and population density, was evaluated and several new models were developed. The new models also accounted for population change, drinking water hardness, and smoking, and included a more detailed description of race. Cause-of-death categories analyzed include all causes, all non-external causes, major cardiovascular diseases, and chronic obstructive pulmonary diseases (COPD). Both annual mortality rates and their logarithms were analyzed. The data on particulatesmore » were averaged across all monitoring stations available for each SMSA and the TSP data were restricted to the year 1980. The associations between mortality and air pollution were found to be dependent on the socioeconomic factors included in the models, the specific locations included din the data set, and the type of statistical model used. Statistically significant associations were found between TSP and mortality due to non-external causes with log-linear models, but not with a linear model, and between TS and COPD mortality for both linear and log-linear models. When the sulfate contribution to TSP was subtracted, the relationship with COPD mortality was strengthened. Scatter plots and quintile analyses suggested a TSP threshold for COPD mortality at around 65 ug/m{sup 3} (annual average). SO{sub 4}{sup {minus}2}, Mn, PM{sup 15}, and PM{sub 2.5} were not significantly associated with mortality using the new models.« less
Nasari, Masoud M; Szyszkowicz, Mieczysław; Chen, Hong; Crouse, Daniel; Turner, Michelle C; Jerrett, Michael; Pope, C Arden; Hubbell, Bryan; Fann, Neal; Cohen, Aaron; Gapstur, Susan M; Diver, W Ryan; Stieb, David; Forouzanfar, Mohammad H; Kim, Sun-Young; Olives, Casey; Krewski, Daniel; Burnett, Richard T
2016-01-01
The effectiveness of regulatory actions designed to improve air quality is often assessed by predicting changes in public health resulting from their implementation. Risk of premature mortality from long-term exposure to ambient air pollution is the single most important contributor to such assessments and is estimated from observational studies generally assuming a log-linear, no-threshold association between ambient concentrations and death. There has been only limited assessment of this assumption in part because of a lack of methods to estimate the shape of the exposure-response function in very large study populations. In this paper, we propose a new class of variable coefficient risk functions capable of capturing a variety of potentially non-linear associations which are suitable for health impact assessment. We construct the class by defining transformations of concentration as the product of either a linear or log-linear function of concentration multiplied by a logistic weighting function. These risk functions can be estimated using hazard regression survival models with currently available computer software and can accommodate large population-based cohorts which are increasingly being used for this purpose. We illustrate our modeling approach with two large cohort studies of long-term concentrations of ambient air pollution and mortality: the American Cancer Society Cancer Prevention Study II (CPS II) cohort and the Canadian Census Health and Environment Cohort (CanCHEC). We then estimate the number of deaths attributable to changes in fine particulate matter concentrations over the 2000 to 2010 time period in both Canada and the USA using both linear and non-linear hazard function models.
Normal reference values for bladder wall thickness on CT in a healthy population.
Fananapazir, Ghaneh; Kitich, Aleksandar; Lamba, Ramit; Stewart, Susan L; Corwin, Michael T
2018-02-01
To determine normal bladder wall thickness on CT in patients without bladder disease. Four hundred and nineteen patients presenting for trauma with normal CTs of the abdomen and pelvis were included in our retrospective study. Bladder wall thickness was assessed, and bladder volume was measured using both the ellipsoid formula and an automated technique. Patient age, gender, and body mass index were recorded. Linear regression models were created to account for bladder volume, age, gender, and body mass index, and the multiple correlation coefficient with bladder wall thickness was computed. Bladder volume and bladder wall thickness were log-transformed to achieve approximate normality and homogeneity of variance. Variables that did not contribute substantively to the model were excluded, and a parsimonious model was created and the multiple correlation coefficient was calculated. Expected bladder wall thickness was estimated for different bladder volumes, and 1.96 standard deviation above expected provided the upper limit of normal on the log scale. Age, gender, and bladder volume were associated with bladder wall thickness (p = 0.049, 0.024, and < 0.001, respectively). The linear regression model had an R 2 of 0.52. Age and gender were negligible in contribution to the model, and a parsimonious model using only volume was created for both the ellipsoid and automated volumes (R 2 = 0.52 and 0.51, respectively). Bladder wall thickness correlates with bladder wall volume. The study provides reference bladder wall thicknesses on CT utilizing both the ellipsoid formula and automated bladder volumes.
Cook, James P; Mahajan, Anubha; Morris, Andrew P
2017-02-01
Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.
Competing regression models for longitudinal data.
Alencar, Airlane P; Singer, Julio M; Rocha, Francisco Marcelo M
2012-03-01
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretest-posttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Wu, Z.; Guo, Z.
2017-12-01
We measured 15 parent polycyclic aromatic hydrocarbons (PAHs) in atmosphere and water during a research cruise from the East China Sea (ECS) to the northwestern Pacific Ocean (NWP) in the spring of 2015 to investigate the occurrence, air-sea gas exchange, and gas-particle partitioning of PAHs with a particular focus on the influence of East Asian continental outflow. The gaseous PAH composition and identification of sources were consistent with PAHs from the upwind area, indicating that the gaseous PAHs (three- to five-ring PAHs) were influenced by upwind land pollution. In addition, air-sea exchange fluxes of gaseous PAHs were estimated to be -54.2 to 107.4 ng m-2 d-1, and was indicative of variations of land-based PAH inputs. The logarithmic gas-particle partition coefficient (logKp) of PAHs regressed linearly against the logarithmic subcooled liquid vapor pressure, with a slope of -0.25. This was significantly larger than the theoretical value (-1), implying disequilibrium between the gaseous and particulate PAHs over the NWP. The non-equilibrium of PAH gas-particle partitioning was shielded from the volatilization of three-ring gaseous PAHs from seawater and lower soot concentrations in particular when the oceanic air masses prevailed. Modeling PAH absorption into organic matter and adsorption onto soot carbon revealed that the status of PAH gas-particle partitioning deviated more from the modeling Kp for oceanic air masses than those for continental air masses, which coincided with higher volatilization of three-ring PAHs and confirmed the influence of air-sea exchange. Meanwhile, significant linear regressions between logKp and logKoa (logKsa) for PAHs were observed for continental air masses, suggesting the dominant effect of East Asian continental outflow on atmospheric PAHs over the NWP during the sampling campaign.
Brummel, Sean S; Singh, Kumud K; Maihofer, Adam X.; Farhad, Mona; Qin, Min; Fenton, Terry; Nievergelt, Caroline M.; Spector, Stephen A.
2015-01-01
Background Ancestry informative markers (AIMs) measure genetic admixtures within an individual beyond self-reported racial/ethnic (SRR) groups. Here, we used genetically determined ancestry (GDA) across SRR groups and examine associations between GDA and HIV-1 RNA and CD4+ counts in HIV-positive children in the US. Methods 41 AIMs, developed to distinguish 7 continental regions, were detected by real-time-PCR in 994 HIV-positive, antiretroviral naïve children. GDA was estimated comparing each individual’s genotypes to allele frequencies found in a large set of reference individuals originating from global populations using STRUCTURE. The means of GDA were calculated for each category of SRR. Linear regression was used to model GDA on CD4+ count and log10 RNA, adjusting for SRR and age. Results Subjects were 61% Black, 25% Hispanic, 13% White and 1.3% Unknown. The mean age was 2.3 years (45% male), mean CD4+ count 981 cells/mm3, and mean log10 RNA 5.11. Marked heterogeneity was found for all SRR groups with high admixture for Hispanics. In adjusted linear regression models, subjects with 100% European ancestry were estimated to have 0.33 higher log10 RNA levels (95% CI: (0.03, 0.62), p=0.028) and 253 CD4+ cells /mm3 lower (95% CI: (−517, 11), p = 0.06) in CD4+ count, compared to subjects with 100% African ancestry. Conclusion Marked continental admixture was found among this cohort of HIV-infected children from the US. GDA contributed to differences in RNA and CD4+ counts beyond SRR, and should be considered when outcomes associated with HIV infection are likely to have a genetic component. PMID:26536313
Brummel, Sean S; Singh, Kumud K; Maihofer, Adam X; Farhad, Mona; Qin, Min; Fenton, Terry; Nievergelt, Caroline M; Spector, Stephen A
2016-04-15
Ancestry informative markers (AIMs) measure genetic admixtures within an individual beyond self-reported racial/ethnic (SRR) groups. Here, we used genetically determined ancestry (GDA) across SRR groups and examine associations between GDA and HIV-1 RNA and CD4 counts in HIV-positive children in the United States. Forty-one AIMs, developed to distinguish 7 continental regions, were detected by real-time PCR in 994 HIV-positive, antiretroviral naive children. GDA was estimated comparing each individual's genotypes to allele frequencies found in a large set of reference individuals originating from global populations using STRUCTURE. The means of GDA were calculated for each category of SRR. Linear regression was used to model GDA on CD4 count and log10 RNA, adjusting for SRR and age. Subjects were 61% black, 25% Hispanic, 13% white, and 1.3% Unknown. The mean age was 2.3 years (45% male), mean CD4 count of 981 cells per cubic millimeter, and mean log10 RNA of 5.11. Marked heterogeneity was found for all SRR groups with high admixture for Hispanics. In adjusted linear regression models, subjects with 100% European ancestry were estimated to have 0.33 higher log10 RNA levels (95% CI: 0.03 to 0.62, P = 0.028) and 253 CD4 cells per cubic millimeter lower (95% CI: -517 to 11, P = 0.06) in CD4 count, compared to subjects with 100% African ancestry. Marked continental admixture was found among this cohort of HIV-infected children from the United States. GDA contributed to differences in RNA and CD4 counts beyond SRR and should be considered when outcomes associated with HIV infection are likely to have a genetic component.
Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna
2017-09-05
The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Jardínez, Christiaan; Vela, Alberto; Cruz-Borbolla, Julián; Alvarez-Mendez, Rodrigo J; Alvarado-Rodríguez, José G
2016-12-01
The relationship between the chemical structure and biological activity (log IC 50 ) of 40 derivatives of 1,4-dihydropyridines (DHPs) was studied using density functional theory (DFT) and multiple linear regression analysis methods. With the aim of improving the quantitative structure-activity relationship (QSAR) model, the reduced density gradient s( r) of the optimized equilibrium geometries was used as a descriptor to include weak non-covalent interactions. The QSAR model highlights the correlation between the log IC 50 with highest molecular orbital energy (E HOMO ), molecular volume (V), partition coefficient (log P), non-covalent interactions NCI(H4-G) and the dual descriptor [Δf(r)]. The model yielded values of R 2 =79.57 and Q 2 =69.67 that were validated with the next four internal analytical validations DK=0.076, DQ=-0.006, R P =0.056, and R N =0.000, and the external validation Q 2 boot =64.26. The QSAR model found can be used to estimate biological activity with high reliability in new compounds based on a DHP series. Graphical abstract The good correlation between the log IC 50 with the NCI (H4-G) estimated by the reduced density gradient approach of the DHP derivatives.
Anstey, Chris M
2005-06-01
Currently, three strong ion models exist for the determination of plasma pH. Mathematically, they vary in their treatment of weak acids, and this study was designed to determine whether any significant differences exist in the simulated performance of these models. The models were subjected to a "metabolic" stress either in the form of variable strong ion difference and fixed weak acid effect, or vice versa, and compared over the range 25 < or = Pco(2) < or = 135 Torr. The predictive equations for each model were iteratively solved for pH at each Pco(2) step, and the results were plotted as a series of log(Pco(2))-pH titration curves. The results were analyzed for linearity by using ordinary least squares regression and for collinearity by using correlation. In every case, the results revealed a linear relationship between log(Pco(2)) and pH over the range 6.8 < or = pH < or = 7.8, and no significant difference between the curve predictions under metabolic stress. The curves were statistically collinear. Ultimately, their clinical utility will be determined both by acceptance of the strong ion framework for describing acid-base physiology and by the ease of measurement of the independent model parameters.
Spencer, Monique E; Jain, Alka; Matteini, Amy; Beamer, Brock A; Wang, Nae-Yuh; Leng, Sean X; Punjabi, Naresh M; Walston, Jeremy D; Fedarko, Neal S
2010-08-01
Neopterin, a GTP metabolite expressed by macrophages, is a marker of immune activation. We hypothesize that levels of this serum marker alter with donor age, reflecting increased chronic immune activation in normal aging. In addition to age, we assessed gender, race, body mass index (BMI), and percentage of body fat (%fat) as potential covariates. Serum was obtained from 426 healthy participants whose age ranged from 18 to 87 years. Anthropometric measures included %fat and BMI. Neopterin concentrations were measured by competitive ELISA. The paired associations between neopterin and age, BMI, or %fat were analyzed by Spearman's correlation or by linear regression of log-transformed neopterin, whereas overall associations were modeled by multiple regression of log-transformed neopterin as a function of age, gender, race, BMI, %fat, and interaction terms. Across all participants, neopterin exhibited a positive association with age, BMI, and %fat. Multiple regression modeling of neopterin in women and men as a function of age, BMI, and race revealed that each covariate contributed significantly to neopterin values and that optimal modeling required an interaction term between race and BMI. The covariate %fat was highly correlated with BMI and could be substituted for BMI to yield similar regression coefficients. The association of age and gender with neopterin levels and their modification by race, BMI, or %fat reflect the biology underlying chronic immune activation and perhaps gender differences in disease incidence, morbidity, and mortality.
(Draft) Community air pollution and mortality: Analysis of 1980 data from US metropolitan areas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipfert, F.W.
1992-11-01
1980 data from up to 149 metropolitan areas were used to define cross-sectional associations between community air pollution and ``excess`` human mortality. The regression model proposed by Ozkaynak and Thurston (1987), which accounted for age, race, education, poverty, and population density, was evaluated and several new models were developed. The new models also accounted for migration, drinking water hardness, and smoking, and included a more detailed description of race. Cause-of-death categories analyzed include all causes, all ``non-external`` causes, major cardiovascular diseases, and chronic obstructive pulmonary diseases (COPD). Both annual mortality rates and their logarithms were analyzed. Air quality data weremore » obtained from the EPA AIRS database (TSP, SO{sub 4}{sup =}, Mn, and ozone) and from the inhalable particulate network (PM{sub 15}, PM{sub 2.5} and SO{sub 4}{sup =}, for 63{sup 4} locations). The data on particulates were averaged across all monitoring stations available for each SMSA and the TSP data were restricted to the year 1980. The associations between mortality and air pollution were found to be dependent on the socioeconomic factors included in the models, the specific locations included in the data set, and the type of statistical model used. Statistically significant associations were found as follows: between TSP and mortality due to non-external causes with log-linear models, but not with a linear model betweenestimated 10-year average (1980--90) ozone levels and 1980 non-external and cardiovascular deaths; and between TSP and COPD mortality for both linear and log-linear models. When the sulfate contribution to TSP was subtracted, the relationship with COPD mortality was strengthened.« less
(Draft) Community air pollution and mortality: Analysis of 1980 data from US metropolitan areas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipfert, F.W.
1992-11-01
1980 data from up to 149 metropolitan areas were used to define cross-sectional associations between community air pollution and excess'' human mortality. The regression model proposed by Ozkaynak and Thurston (1987), which accounted for age, race, education, poverty, and population density, was evaluated and several new models were developed. The new models also accounted for migration, drinking water hardness, and smoking, and included a more detailed description of race. Cause-of-death categories analyzed include all causes, all non-external'' causes, major cardiovascular diseases, and chronic obstructive pulmonary diseases (COPD). Both annual mortality rates and their logarithms were analyzed. Air quality data weremore » obtained from the EPA AIRS database (TSP, SO[sub 4][sup =], Mn, and ozone) and from the inhalable particulate network (PM[sub 15], PM[sub 2.5] and SO[sub 4][sup =], for 63[sup 4] locations). The data on particulates were averaged across all monitoring stations available for each SMSA and the TSP data were restricted to the year 1980. The associations between mortality and air pollution were found to be dependent on the socioeconomic factors included in the models, the specific locations included in the data set, and the type of statistical model used. Statistically significant associations were found as follows: between TSP and mortality due to non-external causes with log-linear models, but not with a linear model betweenestimated 10-year average (1980--90) ozone levels and 1980 non-external and cardiovascular deaths; and between TSP and COPD mortality for both linear and log-linear models. When the sulfate contribution to TSP was subtracted, the relationship with COPD mortality was strengthened.« less
The prisoner's dilemma as a cancer model.
West, Jeffrey; Hasnain, Zaki; Mason, Jeremy; Newton, Paul K
2016-09-01
Tumor development is an evolutionary process in which a heterogeneous population of cells with different growth capabilities compete for resources in order to gain a proliferative advantage. What are the minimal ingredients needed to recreate some of the emergent features of such a developing complex ecosystem? What is a tumor doing before we can detect it? We outline a mathematical model, driven by a stochastic Moran process, in which cancer cells and healthy cells compete for dominance in the population. Each are assigned payoffs according to a Prisoner's Dilemma evolutionary game where the healthy cells are the cooperators and the cancer cells are the defectors. With point mutational dynamics, heredity, and a fitness landscape controlling birth and death rates, natural selection acts on the cell population and simulated 'cancer-like' features emerge, such as Gompertzian tumor growth driven by heterogeneity, the log-kill law which (linearly) relates therapeutic dose density to the (log) probability of cancer cell survival, and the Norton-Simon hypothesis which (linearly) relates tumor regression rates to tumor growth rates. We highlight the utility, clarity, and power that such models provide, despite (and because of) their simplicity and built-in assumptions.
NASA Technical Reports Server (NTRS)
Nelson, Ross; Margolis, Hank; Montesano, Paul; Sun, Guoqing; Cook, Bruce; Corp, Larry; Andersen, Hans-Erik; DeJong, Ben; Pellat, Fernando Paz; Fickel, Thaddeus;
2016-01-01
Existing national forest inventory plots, an airborne lidar scanning (ALS) system, and a space profiling lidar system (ICESat-GLAS) are used to generate circa 2005 estimates of total aboveground dry biomass (AGB) in forest strata, by state, in the continental United States (CONUS) and Mexico. The airborne lidar is used to link ground observations of AGB to space lidar measurements. Two sets of models are generated, the first relating ground estimates of AGB to airborne laser scanning (ALS) measurements and the second set relating ALS estimates of AGB (generated using the first model set) to GLAS measurements. GLAS then, is used as a sampling tool within a hybrid estimation framework to generate stratum-, state-, and national-level AGB estimates. A two-phase variance estimator is employed to quantify GLAS sampling variability and, additively, ALS-GLAS model variability in this current, three-phase (ground-ALS-space lidar) study. The model variance component characterizes the variability of the regression coefficients used to predict ALS-based estimates of biomass as a function of GLAS measurements. Three different types of predictive models are considered in CONUS to determine which produced biomass totals closest to ground-based national forest inventory estimates - (1) linear (LIN), (2) linear-no-intercept (LNI), and (3) log-linear. For CONUS at the national level, the GLAS LNI model estimate (23.95 +/- 0.45 Gt AGB), agreed most closely with the US national forest inventory ground estimate, 24.17 +/- 0.06 Gt, i.e., within 1%. The national biomass total based on linear ground-ALS and ALS-GLAS models (25.87 +/- 0.49 Gt) overestimated the national ground-based estimate by 7.5%. The comparable log-linear model result (63.29 +/-1.36 Gt) overestimated ground results by 261%. All three national biomass GLAS estimates, LIN, LNI, and log-linear, are based on 241,718 pulses collected on 230 orbits. The US national forest inventory (ground) estimates are based on 119,414 ground plots. At the US state level, the average absolute value of the deviation of LNI GLAS estimates from the comparable ground estimate of total biomass was 18.8% (range: Oregon,-40.8% to North Dakota, 128.6%). Log-linear models produced gross overestimates in the continental US, i.e., N2.6x, and the use of this model to predict regional biomass using GLAS data in temperate, western hemisphere forests is not appropriate. The best model form, LNI, is used to produce biomass estimates in Mexico. The average biomass density in Mexican forests is 53.10 +/- 0.88 t/ha, and the total biomass for the country, given a total forest area of 688,096 sq km, is 3.65 +/- 0.06 Gt. In Mexico, our GLAS biomass total underestimated a 2005 FAO estimate (4.152 Gt) by 12% and overestimated a 2007/8 radar study's figure (3.06 Gt) by 19%.
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies.
Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre
2018-03-15
Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. We propose a methodology based on Cox mixed models and written under the R language. This semiparametric model is indeed flexible enough to fit duration data. To compare log-linear and Cox mixed models in terms of goodness-of-fit on real data sets, we also provide a procedure based on simulations and quantile-quantile plots. We present two examples from a data set of speech and gesture interactions, which illustrate the limitations of linear and log-linear mixed models, as compared to Cox models. The linear models are not validated on our data, whereas Cox models are. Moreover, in the second example, the Cox model exhibits a significant effect that the linear model does not. We provide methods to select the best-fitting models for repeated duration data and to compare statistical methodologies. In this study, we show that Cox models are best suited to the analysis of our data set.
Aspects of porosity prediction using multivariate linear regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byrnes, A.P.; Wilson, M.D.
1991-03-01
Highly accurate multiple linear regression models have been developed for sandstones of diverse compositions. Porosity reduction or enhancement processes are controlled by the fundamental variables, Pressure (P), Temperature (T), Time (t), and Composition (X), where composition includes mineralogy, size, sorting, fluid composition, etc. The multiple linear regression equation, of which all linear porosity prediction models are subsets, takes the generalized form: Porosity = C{sub 0} + C{sub 1}(P) + C{sub 2}(T) + C{sub 3}(X) + C{sub 4}(t) + C{sub 5}(PT) + C{sub 6}(PX) + C{sub 7}(Pt) + C{sub 8}(TX) + C{sub 9}(Tt) + C{sub 10}(Xt) + C{sub 11}(PTX) + C{submore » 12}(PXt) + C{sub 13}(PTt) + C{sub 14}(TXt) + C{sub 15}(PTXt). The first four primary variables are often interactive, thus requiring terms involving two or more primary variables (the form shown implies interaction and not necessarily multiplication). The final terms used may also involve simple mathematic transforms such as log X, e{sup T}, X{sup 2}, or more complex transformations such as the Time-Temperature Index (TTI). The X term in the equation above represents a suite of compositional variable and, therefore, a fully expanded equation may include a series of terms incorporating these variables. Numerous published bivariate porosity prediction models involving P (or depth) or Tt (TTI) are effective to a degree, largely because of the high degree of colinearity between p and TTI. However, all such bivariate models ignore the unique contributions of P and Tt, as well as various X terms. These simpler models become poor predictors in regions where colinear relations change, were important variables have been ignored, or where the database does not include a sufficient range or weight distribution for the critical variables.« less
Multivariate regression model for predicting lumber grade volumes of northern red oak sawlogs
Daniel A. Yaussy; Robert L. Brisbin
1983-01-01
A multivariate regression model was developed to predict green board-foot yields for the seven common factory lumber grades processed from northern red oak (Quercus rubra L.) factory grade logs. The model uses the standard log measurements of grade, scaling diameter, length, and percent defect. It was validated with an independent data set. The model...
Normality of raw data in general linear models: The most widespread myth in statistics
Kery, Marc; Hatfield, Jeff S.
2003-01-01
In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
Yelland, Lisa N; Salter, Amy B; Ryan, Philip
2011-10-15
Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
Austin, Peter C.
2017-01-01
Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954
Log-Multiplicative Association Models as Item Response Models
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Yu, Hsiu-Ting
2007-01-01
Log-multiplicative association (LMA) models, which are special cases of log-linear models, have interpretations in terms of latent continuous variables. Two theoretical derivations of LMA models based on item response theory (IRT) arguments are presented. First, we show that Anderson and colleagues (Anderson & Vermunt, 2000; Anderson & Bockenholt,…
Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello
2016-05-01
Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver
2012-01-01
Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item-fit statistics for correct and misspecified diagnostic classification models within a log-linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3…
Statistical analysis of dendritic spine distributions in rat hippocampal cultures
2013-01-01
Background Dendritic spines serve as key computational structures in brain plasticity. Much remains to be learned about their spatial and temporal distribution among neurons. Our aim in this study was to perform exploratory analyses based on the population distributions of dendritic spines with regard to their morphological characteristics and period of growth in dissociated hippocampal neurons. We fit a log-linear model to the contingency table of spine features such as spine type and distance from the soma to first determine which features were important in modeling the spines, as well as the relationships between such features. A multinomial logistic regression was then used to predict the spine types using the features suggested by the log-linear model, along with neighboring spine information. Finally, an important variant of Ripley’s K-function applicable to linear networks was used to study the spatial distribution of spines along dendrites. Results Our study indicated that in the culture system, (i) dendritic spine densities were "completely spatially random", (ii) spine type and distance from the soma were independent quantities, and most importantly, (iii) spines had a tendency to cluster with other spines of the same type. Conclusions Although these results may vary with other systems, our primary contribution is the set of statistical tools for morphological modeling of spines which can be used to assess neuronal cultures following gene manipulation such as RNAi, and to study induced pluripotent stem cells differentiated to neurons. PMID:24088199
Linear models for assessing mechanisms of sperm competition: the trouble with transformations.
Eggert, Anne-Katrin; Reinhardt, Klaus; Sakaluk, Scott K
2003-01-01
Although sperm competition is a pervasive selective force shaping the reproductive tactics of males, the mechanisms underlying different patterns of sperm precedence remain obscure. Parker et al. (1990) developed a series of linear models designed to identify two of the more basic mechanisms: sperm lotteries and sperm displacement; the models can be tested experimentally by manipulating the relative numbers of sperm transferred by rival males and determining the paternity of offspring. Here we show that tests of the model derived for sperm lotteries can result in misleading inferences about the underlying mechanism of sperm precedence because the required inverse transformations may lead to a violation of fundamental assumptions of linear regression. We show that this problem can be remedied by reformulating the model using the actual numbers of offspring sired by each male, and log-transforming both sides of the resultant equation. Reassessment of data from a previous study (Sakaluk and Eggert 1996) using the corrected version of the model revealed that we should not have excluded a simple sperm lottery as a possible mechanism of sperm competition in decorated crickets, Gryllodes sigillatus.
Predicting clicks of PubMed articles.
Mao, Yuqing; Lu, Zhiyong
2013-01-01
Predicting the popularity or access usage of an article has the potential to improve the quality of PubMed searches. We can model the click trend of each article as its access changes over time by mining the PubMed query logs, which contain the previous access history for all articles. In this article, we examine the access patterns produced by PubMed users in two years (July 2009 to July 2011). We explore the time series of accesses for each article in the query logs, model the trends with regression approaches, and subsequently use the models for prediction. We show that the click trends of PubMed articles are best fitted with a log-normal regression model. This model allows the number of accesses an article receives and the time since it first becomes available in PubMed to be related via quadratic and logistic functions, with the model parameters to be estimated via maximum likelihood. Our experiments predicting the number of accesses for an article based on its past usage demonstrate that the mean absolute error and mean absolute percentage error of our model are 4.0% and 8.1% lower than the power-law regression model, respectively. The log-normal distribution is also shown to perform significantly better than a previous prediction method based on a human memory theory in cognitive science. This work warrants further investigation on the utility of such a log-normal regression approach towards improving information access in PubMed.
Predicting clicks of PubMed articles
Mao, Yuqing; Lu, Zhiyong
2013-01-01
Predicting the popularity or access usage of an article has the potential to improve the quality of PubMed searches. We can model the click trend of each article as its access changes over time by mining the PubMed query logs, which contain the previous access history for all articles. In this article, we examine the access patterns produced by PubMed users in two years (July 2009 to July 2011). We explore the time series of accesses for each article in the query logs, model the trends with regression approaches, and subsequently use the models for prediction. We show that the click trends of PubMed articles are best fitted with a log-normal regression model. This model allows the number of accesses an article receives and the time since it first becomes available in PubMed to be related via quadratic and logistic functions, with the model parameters to be estimated via maximum likelihood. Our experiments predicting the number of accesses for an article based on its past usage demonstrate that the mean absolute error and mean absolute percentage error of our model are 4.0% and 8.1% lower than the power-law regression model, respectively. The log-normal distribution is also shown to perform significantly better than a previous prediction method based on a human memory theory in cognitive science. This work warrants further investigation on the utility of such a log-normal regression approach towards improving information access in PubMed. PMID:24551386
Tang, Ronggui; Ding, Changfeng; Ma, Yibing; Wan, Mengxue; Zhang, Taolin; Wang, Xingxiang
2018-06-02
To explore the main controlling factors in soil and build a predictive model between the lead concentrations in earthworms (Pb earthworm ) and the soil physicochemical parameters, 13 soils with low level of lead contamination were used to conduct toxicity experiments using earthworms. The results indicated that a relatively high bioaccumulation factor appeared in the soils with low pH values. The lead concentrations between earthworms and soils after log transformation had a significantly positive correlation (R 2 = 0.46, P < 0.0001, n = 39). Stepwise multiple linear regression analysis derived a fitting empirical model between Pb earthworm and the soil physicochemical properties: log(Pb earthworm ) = 0.96log(Pb soil ) - 0.74log(OC) - 0.22pH + 0.95, (R 2 = 0.66, n = 39). Furthermore, path analysis confirmed that the Pb concentrations in the soil (Pb soil ), soil pH, and soil organic carbon (OC) were the primary controlling factors of Pb earthworm with high pathway parameters (0.71, - 0.51, and - 0.49, respectively). The predictive model based on Pb earthworm in a nationwide range of soils with low-level lead contamination could provide a reference for the establishment of safety thresholds in Pb-contaminated soils from the perspective of soil-animal systems.
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Three-Dimensional City Determinants of the Urban Heat Island: A Statistical Approach
NASA Astrophysics Data System (ADS)
Chun, Bum Seok
There is no doubt that the Urban Heat Island (UHI) is a mounting problem in built-up environments, due to the energy retention by the surface materials of dense buildings, leading to increased temperatures, air pollution, and energy consumption. Much of the earlier research on the UHI has used two-dimensional (2-D) information, such as land uses and the distribution of vegetation. In the case of homogeneous land uses, it is possible to predict surface temperatures with reasonable accuracy with 2-D information. However, three-dimensional (3-D) information is necessary to analyze more complex sites, including dense building clusters. Recent research on the UHI has started to consider multi-dimensional models. The purpose of this research is to explore the urban determinants of the UHI, using 2-D/3-D urban information with statistical modeling. The research includes the following stages: (a) estimating urban temperature, using satellite images, (b) developing a 3-D city model by LiDAR data, (c) generating geometric parameters with regard to 2-/3-D geospatial information, and (d) conducting different statistical analyses: OLS and spatial regressions. The research area is part of the City of Columbus, Ohio. To effectively and systematically analyze the UHI, hierarchical grid scales (480m, 240m, 120m, 60m, and 30m) are proposed, together with linear and the log-linear regression models. The non-linear OLS models with Log(AST) as dependent variable have the highest R2 among all the OLS-estimated models. However, both SAR and GSM models are estimated for the 480m, 240m, 120m, and 60m grids to reduce their spatial dependency. Most GSM models have R2s higher than 0.9, except for the 240m grid. Overall, the urban characteristics having high impacts in all grids are embodied in solar radiation, 3-D open space, greenery, and water streams. These results demonstrate that it is possible to mitigate the UHI, providing guidelines for policies aiming to reduce the UHI.
Schüle, Steffen Andreas; Gabriel, Katharina M A; Bolte, Gabriele
2017-06-01
The environmental justice framework states that besides environmental burdens also resources may be social unequally distributed both on the individual and on the neighbourhood level. This ecological study investigated whether neighbourhood socioeconomic position (SEP) was associated with neighbourhood public green space availability in a large German city with more than 1 million inhabitants. Two different measures were defined for green space availability. Firstly, percentage of green space within neighbourhoods was calculated with the additional consideration of various buffers around the boundaries. Secondly, percentage of green space was calculated based on various radii around the neighbourhood centroid. An index of neighbourhood SEP was calculated with principal component analysis. Log-gamma regression from the group of generalized linear models was applied in order to consider the non-normal distribution of the response variable. All models were adjusted for population density. Low neighbourhood SEP was associated with decreasing neighbourhood green space availability including 200m up to 1000m buffers around the neighbourhood boundaries. Low neighbourhood SEP was also associated with decreasing green space availability based on catchment areas measured from neighbourhood centroids with different radii (1000m up to 3000 m). With an increasing radius the strength of the associations decreased. Social unequally distributed green space may amplify environmental health inequalities in an urban context. Thus, the identification of vulnerable neighbourhoods and population groups plays an important role for epidemiological research and healthy city planning. As a methodical aspect, log-gamma regression offers an adequate parametric modelling strategy for positively distributed environmental variables. Copyright © 2017 Elsevier GmbH. All rights reserved.
Yuan, Jintao; Yu, Shuling; Zhang, Ting; Yuan, Xuejie; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-06-01
Octanol/water (K(OW)) and octanol/air (K(OA)) partition coefficients are two important physicochemical properties of organic substances. In current practice, K(OW) and K(OA) values of some polychlorinated biphenyls (PCBs) are measured using generator column method. Quantitative structure-property relationship (QSPR) models can serve as a valuable alternative method of replacing or reducing experimental steps in the determination of K(OW) and K(OA). In this paper, two different methods, i.e., multiple linear regression based on dragon descriptors and hologram quantitative structure-activity relationship, were used to predict generator-column-derived log K(OW) and log K(OA) values of PCBs. The predictive ability of the developed models was validated using a test set, and the performances of all generated models were compared with those of three previously reported models. All results indicated that the proposed models were robust and satisfactory and can thus be used as alternative models for the rapid assessment of the K(OW) and K(OA) of PCBs. Copyright © 2016 Elsevier Inc. All rights reserved.
Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.
2000-01-01
The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.
Andrić, Filip; Šegan, Sandra; Dramićanin, Aleksandra; Majstorović, Helena; Milojković-Opsenica, Dušanka
2016-08-05
Soil-water partition coefficient normalized to the organic carbon content (KOC) is one of the crucial properties influencing the fate of organic compounds in the environment. Chromatographic methods are well established alternative for direct sorption techniques used for KOC determination. The present work proposes reversed-phase thin-layer chromatography (RP-TLC) as a simpler, yet equally accurate method as officially recommended HPLC technique. Several TLC systems were studied including octadecyl-(RP18) and cyano-(CN) modified silica layers in combination with methanol-water and acetonitrile-water mixtures as mobile phases. In total 50 compounds of different molecular shape, size, and various ability to establish specific interactions were selected (phenols, beznodiazepines, triazine herbicides, and polyaromatic hydrocarbons). Calibration set of 29 compounds with known logKOC values determined by sorption experiments was used to build simple univariate calibrations, Principal Component Regression (PCR) and Partial Least Squares (PLS) models between logKOC and TLC retention parameters. Models exhibit good statistical performance, indicating that CN-layers contribute better to logKOC modeling than RP18-silica. The most promising TLC methods, officially recommended HPLC method, and four in silico estimation approaches have been compared by non-parametric Sum of Ranking Differences approach (SRD). The best estimations of logKOC values were achieved by simple univariate calibration of TLC retention data involving CN-silica layers and moderate content of methanol (40-50%v/v). They were ranked far well compared to the officially recommended HPLC method which was ranked in the middle. The worst estimates have been obtained from in silico computations based on octanol-water partition coefficient. Linear Solvation Energy Relationship study revealed that increased polarity of CN-layers over RP18 in combination with methanol-water mixtures is the key to better modeling of logKOC through significant diminishing of dipolar and proton accepting influence of the mobile phase as well as enhancing molar refractivity in excess of the chromatographic systems. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bourke, Sarah A.; Hermann, Kristian J.; Hendry, M. Jim
2017-11-01
Elevated groundwater salinity associated with produced water, leaching from landfills or secondary salinity can degrade arable soils and potable water resources. Direct-push electrical conductivity (EC) profiling enables rapid, relatively inexpensive, high-resolution in-situ measurements of subsurface salinity, without requiring core collection or installation of groundwater wells. However, because the direct-push tool measures the bulk EC of both solid and liquid phases (ECa), incorporation of ECa data into regional or historical groundwater data sets requires the prediction of pore water EC (ECw) or chloride (Cl-) concentrations from measured ECa. Statistical linear regression and physically based models for predicting ECw and Cl- from ECa profiles were tested on a brine plume in central Saskatchewan, Canada. A linear relationship between ECa/ECw and porosity was more accurate for predicting ECw and Cl- concentrations than a power-law relationship (Archie's Law). Despite clay contents of up to 96%, the addition of terms to account for electrical conductance in the solid phase did not improve model predictions. In the absence of porosity data, statistical linear regression models adequately predicted ECw and Cl- concentrations from direct-push ECa profiles (ECw = 5.48 ECa + 0.78, R 2 = 0.87; Cl- = 1,978 ECa - 1,398, R 2 = 0.73). These statistical models can be used to predict ECw in the absence of lithologic data and will be particularly useful for initial site assessments. The more accurate linear physically based model can be used to predict ECw and Cl- as porosity data become available and the site-specific ECw-Cl- relationship is determined.
Green lumber grade yields from factory grade logs of three oak species
Daniel A. Yaussy
1986-01-01
Multivariate regression models were developed to predict green board foot yields for the seven common factory lumber grades processed from white, black, and chestnut oak factory grade logs. These models use the standard log measurements of grade, scaling diameter, log length, and proportion of scaling defect. Any combination of lumber grades (such as 1 Common and...
Job strain and resting heart rate: a cross-sectional study in a Swedish random working sample.
Eriksson, Peter; Schiöler, Linus; Söderberg, Mia; Rosengren, Annika; Torén, Kjell
2016-03-05
Numerous studies have reported an association between stressing work conditions and cardiovascular disease. However, more evidence is needed, and the etiological mechanisms are unknown. Elevated resting heart rate has emerged as a possible risk factor for cardiovascular disease, but little is known about the relation to work-related stress. This study therefore investigated the association between job strain, job control, and job demands and resting heart rate. We conducted a cross-sectional survey of randomly selected men and women in Västra Götalandsregionen, Sweden (West county of Sweden) (n = 1552). Information about job strain, job demands, job control, heart rate and covariates was collected during the period 2001-2004 as part of the INTERGENE/ADONIX research project. Six different linear regression models were used with adjustments for gender, age, BMI, smoking, education, and physical activity in the fully adjusted model. Job strain was operationalized as the log-transformed ratio of job demands over job control in the statistical analyses. No associations were seen between resting heart rate and job demands. Job strain was associated with elevated resting heart rate in the unadjusted model (linear regression coefficient 1.26, 95 % CI 0.14 to 2.38), but not in any of the extended models. Low job control was associated with elevated resting heart rate after adjustments for gender, age, BMI, and smoking (linear regression coefficient -0.18, 95 % CI -0.30 to -0.02). However, there were no significant associations in the fully adjusted model. Low job control and job strain, but not job demands, were associated with elevated resting heart rate. However, the observed associations were modest and may be explained by confounding effects.
NASA Astrophysics Data System (ADS)
Vásquez Lavín, F. A.; Hernandez, J. I.; Ponce, R. D.; Orrego, S. A.
2017-07-01
During recent decades, water demand estimation has gained considerable attention from scholars. From an econometric perspective, the most used functional forms include log-log and linear specifications. Despite the advances in this field and the relevance for policymaking, little attention has been paid to the functional forms used in these estimations, and most authors have not provided justifications for their selection of functional forms. A discrete continuous choice model of the residential water demand is estimated using six functional forms (log-log, full-log, log-quadratic, semilog, linear, and Stone-Geary), and the expected consumption and price elasticity are evaluated. From a policy perspective, our results highlight the relevance of functional form selection for both the expected consumption and price elasticity.
Prenatal Lead Exposure and Fetal Growth: Smaller Infants Have Heightened Susceptibility
Rodosthenous, Rodosthenis S.; Burris, Heather H.; Svensson, Katherine; Amarasiriwardena, Chitra J.; Cantoral, Alejandra; Schnaas, Lourdes; Mercado-García, Adriana; Coull, Brent A.; Wright, Robert O.; Téllez-Rojo, Martha M.; Baccarelli, Andrea A.
2016-01-01
Background As population lead levels decrease, the toxic effects of lead may be distributed to more sensitive populations, such as infants with poor fetal growth. Objectives To determine the association of prenatal lead exposure and fetal growth; and to evaluate whether infants with poor fetal growth are more susceptible to lead toxicity than those with normal fetal growth. Methods We examined the association of second trimester maternal blood lead levels (BLL) with birthweight-for-gestational age (BWGA) z-score in 944 mother-infant participants of the PROGRESS cohort. We determined the association between maternal BLL and BWGA z-score by using both linear and quantile regression. We estimated odds ratios for small-for-gestational age (SGA) infants between maternal BLL quartiles using logistic regression. Maternal age, body mass index, socioeconomic status, parity, household smoking exposure, hemoglobin levels, and infant sex were included as confounders. Results While linear regression showed a negative association between maternal BLL and BWGA z-score (β=−0.06 z-score units per log2 BLL increase; 95% CI: −0.13, 0.003; P=0.06), quantile regression revealed larger magnitudes of this association in the <30th percentiles of BWGA z-score (β range [−0.08, −0.13] z-score units per log2 BLL increase; all P values <0.05). Mothers in the highest BLL quartile had an odds ratio of 1.62 (95% CI: 0.99–2.65) for having a SGA infant compared to the lowest BLL quartile. Conclusions While both linear and quantile regression showed a negative association between prenatal lead exposure and birthweight, quantile regression revealed that smaller infants may represent a more susceptible subpopulation. PMID:27923585
Prenatal lead exposure and fetal growth: Smaller infants have heightened susceptibility.
Rodosthenous, Rodosthenis S; Burris, Heather H; Svensson, Katherine; Amarasiriwardena, Chitra J; Cantoral, Alejandra; Schnaas, Lourdes; Mercado-García, Adriana; Coull, Brent A; Wright, Robert O; Téllez-Rojo, Martha M; Baccarelli, Andrea A
2017-02-01
As population lead levels decrease, the toxic effects of lead may be distributed to more sensitive populations, such as infants with poor fetal growth. To determine the association of prenatal lead exposure and fetal growth; and to evaluate whether infants with poor fetal growth are more susceptible to lead toxicity than those with normal fetal growth. We examined the association of second trimester maternal blood lead levels (BLL) with birthweight-for-gestational age (BWGA) z-score in 944 mother-infant participants of the PROGRESS cohort. We determined the association between maternal BLL and BWGA z-score by using both linear and quantile regression. We estimated odds ratios for small-for-gestational age (SGA) infants between maternal BLL quartiles using logistic regression. Maternal age, body mass index, socioeconomic status, parity, household smoking exposure, hemoglobin levels, and infant sex were included as confounders. While linear regression showed a negative association between maternal BLL and BWGA z-score (β=-0.06 z-score units per log 2 BLL increase; 95% CI: -0.13, 0.003; P=0.06), quantile regression revealed larger magnitudes of this association in the <30th percentiles of BWGA z-score (β range [-0.08, -0.13] z-score units per log 2 BLL increase; all P values<0.05). Mothers in the highest BLL quartile had an odds ratio of 1.62 (95% CI: 0.99-2.65) for having a SGA infant compared to the lowest BLL quartile. While both linear and quantile regression showed a negative association between prenatal lead exposure and birthweight, quantile regression revealed that smaller infants may represent a more susceptible subpopulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...
2015-12-04
Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
Regional flow duration curves: Geostatistical techniques versus multivariate regression
Pugliese, Alessio; Farmer, William H.; Castellarin, Attilio; Archfield, Stacey A.; Vogel, Richard M.
2016-01-01
A period-of-record flow duration curve (FDC) represents the relationship between the magnitude and frequency of daily streamflows. Prediction of FDCs is of great importance for locations characterized by sparse or missing streamflow observations. We present a detailed comparison of two methods which are capable of predicting an FDC at ungauged basins: (1) an adaptation of the geostatistical method, Top-kriging, employing a linear weighted average of dimensionless empirical FDCs, standardised with a reference streamflow value; and (2) regional multiple linear regression of streamflow quantiles, perhaps the most common method for the prediction of FDCs at ungauged sites. In particular, Top-kriging relies on a metric for expressing the similarity between catchments computed as the negative deviation of the FDC from a reference streamflow value, which we termed total negative deviation (TND). Comparisons of these two methods are made in 182 largely unregulated river catchments in the southeastern U.S. using a three-fold cross-validation algorithm. Our results reveal that the two methods perform similarly throughout flow-regimes, with average Nash-Sutcliffe Efficiencies 0.566 and 0.662, (0.883 and 0.829 on log-transformed quantiles) for the geostatistical and the linear regression models, respectively. The differences between the reproduction of FDC's occurred mostly for low flows with exceedance probability (i.e. duration) above 0.98.
NASA Astrophysics Data System (ADS)
Jarzyna, Jadwiga A.; Krakowska, Paulina I.; Puskarczyk, Edyta; Wawrzyniak-Guz, Kamila; Zych, Marcin
2018-03-01
More than 70 rock samples from so-called sweet spots, i.e. the Ordovician Sa Formation and Silurian Ja Member of Pa Formation from the Baltic Basin (North Poland) were examined in the laboratory to determine bulk and grain density, total and effective/dynamic porosity, absolute permeability, pore diameters size, total surface area, and natural radioactivity. Results of the pyrolysis, i.e., TOC (Total Organic Carbon) together with S1 and S2 - parameters used to determine the hydrocarbon generation potential of rocks, were also considered. Elemental composition from chemical analyses and mineral composition from XRD measurements were also included. SCAL analysis, NMR experiments, Pressure Decay Permeability measurements together with water immersion porosimetry and adsorption/ desorption of nitrogen vapors method were carried out along with the comprehensive interpretation of the outcomes. Simple and multiple linear statistical regressions were used to recognize mutual relationships between parameters. Observed correlations and in some cases big dispersion of data and discrepancies in the property values obtained from different methods were the basis for building shale gas rock model for well logging interpretation. The model was verified by the result of the Monte Carlo modelling of spectral neutron-gamma log response in comparison with GEM log results.
Lee, Ho-Won; Muniyappa, Ranganath; Yan, Xu; Yue, Lilly Q.; Linden, Ellen H.; Chen, Hui; Hansen, Barbara C.
2011-01-01
The euglycemic glucose clamp is the reference method for assessing insulin sensitivity in humans and animals. However, clamps are ill-suited for large studies because of extensive requirements for cost, time, labor, and technical expertise. Simple surrogate indexes of insulin sensitivity/resistance including quantitative insulin-sensitivity check index (QUICKI) and homeostasis model assessment (HOMA) have been developed and validated in humans. However, validation studies of QUICKI and HOMA in both rats and mice suggest that differences in metabolic physiology between rodents and humans limit their value in rodents. Rhesus monkeys are a species more similar to humans than rodents. Therefore, in the present study, we evaluated data from 199 glucose clamp studies obtained from a large cohort of 86 monkeys with a broad range of insulin sensitivity. Data were used to evaluate simple surrogate indexes of insulin sensitivity/resistance (QUICKI, HOMA, Log HOMA, 1/HOMA, and 1/Fasting insulin) with respect to linear regression, predictive accuracy using a calibration model, and diagnostic performance using receiver operating characteristic. Most surrogates had modest linear correlations with SIClamp (r ≈ 0.4–0.64) with comparable correlation coefficients. Predictive accuracy determined by calibration model analysis demonstrated better predictive accuracy of QUICKI than HOMA and Log HOMA. Receiver operating characteristic analysis showed equivalent sensitivity and specificity of most surrogate indexes to detect insulin resistance. Thus, unlike in rodents but similar to humans, surrogate indexes of insulin sensitivity/resistance including QUICKI and log HOMA may be reasonable to use in large studies of rhesus monkeys where it may be impractical to conduct glucose clamp studies. PMID:21209021
NASA Astrophysics Data System (ADS)
Zounemat-Kermani, Mohammad
2012-08-01
In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
Vandenhove, H; Van Hees, M; Wouters, K; Wannijn, J
2007-01-01
Present study aims to quantify the influence of soil parameters on soil solution uranium concentration for (238)U spiked soils. Eighteen soils collected under pasture were selected such that they covered a wide range for those parameters hypothesised as being potentially important in determining U sorption. Maximum soil solution uranium concentrations were observed at alkaline pH, high inorganic carbon content and low cation exchange capacity, organic matter content, clay content, amorphous Fe and phosphate levels. Except for the significant correlation between the solid-liquid distribution coefficients (K(d), L kg(-1)) and the organic matter content (R(2)=0.70) and amorphous Fe content (R(2)=0.63), there was no single soil parameter significantly explaining the soil solution uranium concentration (which varied 100-fold). Above pH=6, log(K(d)) was linearly related with pH [log(K(d))=-1.18 pH+10.8, R(2)=0.65]. Multiple linear regression analysis did result in improved predictions of the soil solution uranium concentration but the model was complex.
Schønning, Kristian; Johansen, Kim; Nielsen, Lone Gilmor; Weis, Nina; Westh, Henrik
2018-07-01
Quantification of HBV DNA is used for initiating and monitoring antiviral treatment. Analytical test performance consequently impacts treatment decisions. To compare the analytical performance of the Aptima HBV Quant Assay (Aptima) and the COBAS Ampliprep/COBAS TaqMan HBV Test v2.0 (CAPCTMv2) for the quantification of HBV DNA in plasma samples. The performance of the two tests was compared on 129 prospective plasma samples, and on 63 archived plasma samples of which 53 were genotyped. Linearity of the two assays was assessed on dilutions series of three clinical samples (Genotype B, C, and D). Bland-Altman analysis of 120 clinical samples, which quantified in both tests, showed an average quantification bias (Aptima - CAPCTMv2) of -0.19 Log IU/mL (SD: 0.33 Log IU/mL). A single sample quantified more than three standard deviations higher in Aptima than in CAPCTMv2. Only minor differences were observed between genotype A (N = 4; average difference -0.01 Log IU/mL), B (N = 8; -0.13 Log IU/mL), C (N = 8; -0.31 Log IU/mL), D (N = 25; -0.22 Log IU/mL), and E (N = 7; -0.03 Log IU/mL). Deming regression showed that the two tests were excellently correlated (slope of the regression line 1.03; 95% CI: 0.998-1.068). Linearity of the tests was evaluated on dilution series and showed an excellent correlation of the two tests. Both tests were precise with %CV less than 3% for HBV DNA ≥3 Log IU/mL. The Aptima and CAPCTMv2 tests are highly correlated, and both tests are useful for monitoring patients chronically infected with HBV. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bloomfield, J. P.; Allen, D. J.; Griffiths, K. J.
2009-06-01
SummaryLinear regression methods can be used to quantify geological controls on baseflow index (BFI). This is illustrated using an example from the Thames Basin, UK. Two approaches have been adopted. The areal extents of geological classes based on lithostratigraphic and hydrogeological classification schemes have been correlated with BFI for 44 'natural' catchments from the Thames Basin. When regression models are built using lithostratigraphic classes that include a constant term then the model is shown to have some physical meaning and the relative influence of the different geological classes on BFI can be quantified. For example, the regression constants for two such models, 0.64 and 0.69, are consistent with the mean observed BFI (0.65) for the Thames Basin, and the signs and relative magnitudes of the regression coefficients for each of the lithostratigraphic classes are consistent with the hydrogeology of the Basin. In addition, regression coefficients for the lithostratigraphic classes scale linearly with estimates of log 10 hydraulic conductivity for each lithological class. When a regression is built using a hydrogeological classification scheme with no constant term, the model does not have any physical meaning, but it has a relatively high adjusted R2 value and because of the continuous coverage of the hydrogeological classification scheme, the model can be used for predictive purposes. A model calibrated on the 44 'natural' catchments and using four hydrogeological classes (low-permeability surficial deposits, consolidated aquitards, fractured aquifers and intergranular aquifers) is shown to perform as well as a model based on a hydrology of soil types (BFIHOST) scheme in predicting BFI in the Thames Basin. Validation of this model using 110 other 'variably impacted' catchments in the Basin shows that there is a correlation between modelled and observed BFI. Where the observed BFI is significantly higher than modelled BFI the deviations can be explained by an exogenous factor, catchment urban area. It is inferred that this is may be due influences from sewage discharge, mains leakage, and leakage from septic tanks.
Garcés-Vega, Francisco; Marks, Bradley P
2014-08-01
In the last 20 years, the use of microbial reduction models has expanded significantly, including inactivation (linear and nonlinear), survival, and transfer models. However, a major constraint for model development is the impossibility to directly quantify the number of viable microorganisms below the limit of detection (LOD) for a given study. Different approaches have been used to manage this challenge, including ignoring negative plate counts, using statistical estimations, or applying data transformations. Our objective was to illustrate and quantify the effect of negative plate count data management approaches on parameter estimation for microbial reduction models. Because it is impossible to obtain accurate plate counts below the LOD, we performed simulated experiments to generate synthetic data for both log-linear and Weibull-type microbial reductions. We then applied five different, previously reported data management practices and fit log-linear and Weibull models to the resulting data. The results indicated a significant effect (α = 0.05) of the data management practices on the estimated model parameters and performance indicators. For example, when the negative plate counts were replaced by the LOD for log-linear data sets, the slope of the subsequent log-linear model was, on average, 22% smaller than for the original data, the resulting model underpredicted lethality by up to 2.0 log, and the Weibull model was erroneously selected as the most likely correct model for those data. The results demonstrate that it is important to explicitly report LODs and related data management protocols, which can significantly affect model results, interpretation, and utility. Ultimately, we recommend using only the positive plate counts to estimate model parameters for microbial reduction curves and avoiding any data value substitutions or transformations when managing negative plate counts to yield the most accurate model parameters.
Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman
2016-04-01
Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
Oh, Eric J; Shepherd, Bryan E; Lumley, Thomas; Shaw, Pamela A
2018-04-15
For time-to-event outcomes, a rich literature exists on the bias introduced by covariate measurement error in regression models, such as the Cox model, and methods of analysis to address this bias. By comparison, less attention has been given to understanding the impact or addressing errors in the failure time outcome. For many diseases, the timing of an event of interest (such as progression-free survival or time to AIDS progression) can be difficult to assess or reliant on self-report and therefore prone to measurement error. For linear models, it is well known that random errors in the outcome variable do not bias regression estimates. With nonlinear models, however, even random error or misclassification can introduce bias into estimated parameters. We compare the performance of 2 common regression models, the Cox and Weibull models, in the setting of measurement error in the failure time outcome. We introduce an extension of the SIMEX method to correct for bias in hazard ratio estimates from the Cox model and discuss other analysis options to address measurement error in the response. A formula to estimate the bias induced into the hazard ratio by classical measurement error in the event time for a log-linear survival model is presented. Detailed numerical studies are presented to examine the performance of the proposed SIMEX method under varying levels and parametric forms of the error in the outcome. We further illustrate the method with observational data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic. Copyright © 2017 John Wiley & Sons, Ltd.
Protocol Analysis as a Tool in Function and Task Analysis
1999-10-01
Autocontingency The use of log-linear and logistic regression methods to analyse sequential data seems appealing , and is strongly advocated by...collection and analysis of observational data. Behavior Research Methods, Instruments, and Computers, 23(3), 415-429. Patrick, J. D. (1991). Snob : A
Support vector regression to predict porosity and permeability: Effect of sample size
NASA Astrophysics Data System (ADS)
Al-Anazi, A. F.; Gates, I. D.
2012-02-01
Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function type and loss functions used.
Devos, Stefanie; Cox, Bianca; van Lier, Tom; Nawrot, Tim S; Putman, Koen
2016-09-01
We used log-linear and log-log exposure-response (E-R) functions to model the association between PM2.5 exposure and non-elective hospitalizations for pneumonia, and estimated the attributable hospital costs by using the effect estimates obtained from both functions. We used hospital discharge data on 3519 non-elective pneumonia admissions from UZ Brussels between 2007 and 2012 and we combined a case-crossover design with distributed lag models. The annual averted pneumonia hospitalization costs for a reduction in PM2.5 exposure from the mean (21.4μg/m(3)) to the WHO guideline for annual mean PM2.5 (10μg/m(3)) were estimated and extrapolated for Belgium. Non-elective hospitalizations for pneumonia were significantly associated with PM2.5 exposure in both models. Using a log-linear E-R function, the estimated risk reduction for pneumonia hospitalization associated with a decrease in mean PM2.5 exposure to 10μg/m(3) was 4.9%. The corresponding estimate for the log-log model was 10.7%. These estimates translate to an annual pneumonia hospital cost saving in Belgium of €15.5 million and almost €34 million for the log-linear and log-log E-R function, respectively. Although further research is required to assess the shape of the association between PM2.5 exposure and pneumonia hospitalizations, we demonstrated that estimates for health effects and associated costs heavily depend on the assumed E-R function. These results are important for policy making, as supra-linear E-R associations imply that significant health benefits may still be obtained from additional pollution control measures in areas where PM levels have already been reduced. Copyright © 2016 Elsevier Ltd. All rights reserved.
Model synthesis in frequency analysis of Missouri floods
Hauth, Leland D.
1974-01-01
Synthetic flood records for 43 small-stream sites aided in definition of techniques for estimating the magnitude and frequency of floods in Missouri. The long-term synthetic flood records were generated by use of a digital computer model of the rainfall-runoff process. A relatively short period of concurrent rainfall and runoff data observed at each of the 43 sites was used to calibrate the model, and rainfall records covering from 66 to 78 years for four Missouri sites and pan-evaporation data were used to generate the synthetic records. Flood magnitude and frequency characteristics of both the synthetic records and observed long-term flood records available for 109 large-stream sites were used in a multiple-regression analysis to define relations for estimating future flood characteristics at ungaged sites. That analysis indicated that drainage basin size and slope were the most useful estimating variables. It also indicated that a more complex regression model than the commonly used log-linear one was needed for the range of drainage basin sizes available in this study.
Deformation-Aware Log-Linear Models
NASA Astrophysics Data System (ADS)
Gass, Tobias; Deselaers, Thomas; Ney, Hermann
In this paper, we present a novel deformation-aware discriminative model for handwritten digit recognition. Unlike previous approaches our model directly considers image deformations and allows discriminative training of all parameters, including those accounting for non-linear transformations of the image. This is achieved by extending a log-linear framework to incorporate a latent deformation variable. The resulting model has an order of magnitude less parameters than competing approaches to handling image deformations. We tune and evaluate our approach on the USPS task and show its generalization capabilities by applying the tuned model to the MNIST task. We gain interesting insights and achieve highly competitive results on both tasks.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Yoon, Hyunjoo; Lee, Joo-Yeon; Suk, Hee-Jin; Lee, Sunah; Lee, Heeyoung; Lee, Soomin; Yoon, Yohan
2012-12-01
This study developed models to predict the growth probabilities and kinetic behavior of Salmonella enterica strains on cutting boards. Polyethylene coupons (3 by 5 cm) were rubbed with pork belly, and pork purge was then sprayed on the coupon surface, followed by inoculation of a five-strain Salmonella mixture onto the surface of the coupons. These coupons were stored at 13 to 35°C for 12 h, and total bacterial and Salmonella cell counts were enumerated on tryptic soy agar and xylose lysine deoxycholate (XLD) agar, respectively, every 2 h, which produced 56 combinations. The combinations that had growth of ≥0.5 log CFU/cm(2) of Salmonella bacteria recovered on XLD agar were given the value 1 (growth), and the combinations that had growth of <0.5 log CFU/cm(2) were assigned the value 0 (no growth). These growth response data from XLD agar were analyzed by logistic regression for producing growth/no growth interfaces of Salmonella bacteria. In addition, a linear model was fitted to the Salmonella cell counts to calculate the growth rate (log CFU per square centimeter per hour) and initial cell count (log CFU per square centimeter), following secondary modeling with the square root model. All of the models developed were validated with observed data, which were not used for model development. Growth of total bacteria and Salmonella cells was observed at 28, 30, 33, and 35°C, but there was no growth detected below 20°C within the time frame investigated. Moreover, various indices indicated that the performance of the developed models was acceptable. The results suggest that the models developed in this study may be useful in predicting the growth/no growth interface and kinetic behavior of Salmonella bacteria on polyethylene cutting boards.
NASA Astrophysics Data System (ADS)
Al-Mudhafar, W. J.
2013-12-01
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly drawing datasets with replacement from the training data. Each sample has the same size of the original training set and it can be conducted N times to produce N bootstrap datasets to re-fit the model accordingly to decrease the squared difference between the estimated and observed categorical variables (facies) leading to decrease the degree of uncertainty.
Breivik, Cathrine Nansdal; Nilsen, Roy Miodini; Myrseth, Erling; Pedersen, Paal Henning; Varughese, Jobin K; Chaudhry, Aqeel Asghar; Lund-Johansen, Morten
2013-07-01
There are few reports about the course of vestibular schwannoma (VS) patients following gamma knife radiosurgery (GKRS) compared with the course following conservative management (CM). In this study, we present prospectively collected data of 237 patients with unilateral VS extending outside the internal acoustic canal who received either GKRS (113) or CM (124). The aim was to measure the effect of GKRS compared with the natural course on tumor growth rate and hearing loss. Secondary end points were postinclusion additional treatment, quality of life (QoL), and symptom development. The patients underwent magnetic resonance imaging scans, clinical examination, and QoL assessment by SF-36 questionnaire. Statistics were performed by using Spearman correlation coefficient, Kaplan-Meier plot, Poisson regression model, mixed linear regression models, and mixed logistic regression models. Mean follow-up time was 55.0 months (26.1 standard deviation, range 10-132). Thirteen patients were lost to follow-up. Serviceable hearing was lost in 54 of 71 (76%) (CM) and 34 of 53 (64%) (GKRS) patients during the study period (not significant, log-rank test). There was a significant reduction in tumor volume over time in the GKRS group. The need for treatment following initial GKRS or CM differed at highly significant levels (log-rank test, P < .001). Symptom and QoL development did not differ significantly between the groups. In VS patients, GKRS reduces the tumor growth rate and thereby the incidence rate of new treatment about tenfold. Hearing is lost at similar rates in both groups. Symptoms and QoL seem not to be significantly affected by GKRS.
Ding, H; Chen, C; Zhang, X
2016-01-01
The linear solvation energy relationship (LSER) was applied to predict the adsorption coefficient (K) of synthetic organic compounds (SOCs) on single-walled carbon nanotubes (SWCNTs). A total of 40 log K values were used to develop and validate the LSER model. The adsorption data for 34 SOCs were collected from 13 published articles and the other six were obtained in our experiment. The optimal model composed of four descriptors was developed by a stepwise multiple linear regression (MLR) method. The adjusted r(2) (r(2)adj) and root mean square error (RMSE) were 0.84 and 0.49, respectively, indicating good fitness. The leave-one-out cross-validation Q(2) ([Formula: see text]) was 0.79, suggesting the robustness of the model was satisfactory. The external Q(2) ([Formula: see text]) and RMSE (RMSEext) were 0.72 and 0.50, respectively, showing the model's strong predictive ability. Hydrogen bond donating interaction (bB) and cavity formation and dispersion interactions (vV) stood out as the two most influential factors controlling the adsorption of SOCs onto SWCNTs. The equilibrium concentration would affect the fitness and predictive ability of the model, while the coefficients varied slightly.
LAS bioconcentration is isomer specific
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tolls, J.; Haller, M.; Graaf, I. de
1995-12-31
The authors measured parent compound specific bioconcentration data for linear alkylbenzene sulfonates in Pimephales promelas. They did so by using cold, custom synthesized sulfophenyl alkanes. They observed that, within homologous series of isomers, the uptake rate constants (k{sub 1}) and the bioconcentration factor (BCF) increase with increasing number of carbon atoms in the alkyl chain (n{sub C-atoms}). In contrast, the elimination rate constant k{sub 2} appears to be independent of the alkyl chain length. Regressions of log BCF vs n{sub C-atoms} yielded different slopes for the homologous groups of the 5- and the 2-sulfophenyl alkane isomers. Regression of all logmore » BCF-data vs log 1/CMC yielded a good description of the data. However, when regressing the data for both homologous series separately again very different slopes are obtained. The results therefore indicate that hydrophobicity-bioconcentration relationships may be different for different homologous groups of sulfophenyl alkanes.« less
Kinetics of hydrogen peroxide decomposition by catalase: hydroxylic solvent effects.
Raducan, Adina; Cantemir, Anca Ruxandra; Puiu, Mihaela; Oancea, Dumitru
2012-11-01
The effect of water-alcohol (methanol, ethanol, propan-1-ol, propan-2-ol, ethane-1,2-diol and propane-1,2,3-triol) binary mixtures on the kinetics of hydrogen peroxide decomposition in the presence of bovine liver catalase is investigated. In all solvents, the activity of catalase is smaller than in water. The results are discussed on the basis of a simple kinetic model. The kinetic constants for product formation through enzyme-substrate complex decomposition and for inactivation of catalase are estimated. The organic solvents are characterized by several physical properties: dielectric constant (D), hydrophobicity (log P), concentration of hydroxyl groups ([OH]), polarizability (α), Kamlet-Taft parameter (β) and Kosower parameter (Z). The relationships between the initial rate, kinetic constants and medium properties are analyzed by linear and multiple linear regression.
Statistical considerations in the analysis of data from replicated bioassays
USDA-ARS?s Scientific Manuscript database
Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...
Tiwari, Anjani K; Ojha, Himanshu; Kaul, Ankur; Dutta, Anupama; Srivastava, Pooja; Shukla, Gauri; Srivastava, Rakesh; Mishra, Anil K
2009-07-01
Nuclear magnetic resonance imaging is a very useful tool in modern medical diagnostics, especially when gadolinium (III)-based contrast agents are administered to the patient with the aim of increasing the image contrast between normal and diseased tissues. With the use of soft modelling techniques such as quantitative structure-activity relationship/quantitative structure-property relationship after a suitable description of their molecular structure, we have studied a series of phosphonic acid for designing new MRI contrast agent. Quantitative structure-property relationship studies with multiple linear regression analysis were applied to find correlation between different calculated molecular descriptors of the phosphonic acid-based chelating agent and their stability constants. The final quantitative structure-property relationship mathematical models were found as--quantitative structure-property relationship Model for phosphonic acid series (Model 1)--log K(ML) = {5.00243(+/-0.7102)}- MR {0.0263(+/-0.540)}n = 12 l r l = 0.942 s = 0.183 F = 99.165 quantitative structure-property relationship Model for phosphonic acid series (Model 2)--log K(ML) = {5.06280(+/-0.3418)}- MR {0.0252(+/- .198)}n = 12 l r l = 0.956 s = 0.186 F = 99.256.
Hoffman, Jennifer C.; Anton, Peter A.; Baldwin, Gayle Cocita; Elliott, Julie; Anisman-Posner, Deborah; Tanner, Karen; Grogan, Tristan; Elashoff, David; Sugar, Catherine; Yang, Otto O.
2014-01-01
Abstract Seminal plasma HIV-1 RNA level is an important determinant of the risk of HIV-1 sexual transmission. We investigated potential associations between seminal plasma cytokine levels and viral concentration in the seminal plasma of HIV-1-infected men. This was a prospective, observational study of paired blood and semen samples from 18 HIV-1 chronically infected men off antiretroviral therapy. HIV-1 RNA levels and cytokine levels in seminal plasma and blood plasma were measured and analyzed using simple linear regressions to screen for associations between cytokines and seminal plasma HIV-1 levels. Forward stepwise regression was performed to construct the final multivariate model. The median HIV-1 RNA concentrations were 4.42 log10 copies/ml (IQR 2.98, 4.70) and 2.96 log10 copies/ml (IQR 2, 4.18) in blood and seminal plasma, respectively. In stepwise multivariate linear regression analysis, blood HIV-1 RNA level (p<0.0001) was most strongly associated with seminal plasma HIV-1 RNA level. After controlling for blood HIV-1 RNA level, seminal plasma HIV-1 RNA level was positively associated with interferon (IFN)-γ (p=0.03) and interleukin (IL)-17 (p=0.03) and negatively associated with IL-5 (p=0.0007) in seminal plasma. In addition to blood HIV-1 RNA level, cytokine profiles in the male genital tract are associated with HIV-1 RNA levels in semen. The Th1 and Th17 cytokines IFN-γ and IL-17 are associated with increased seminal plasma HIV-1 RNA, while the Th2 cytokine IL-5 is associated with decreased seminal plasma HIV-1 RNA. These results support the importance of genital tract immunomodulation in HIV-1 transmission. PMID:25209674
Factors relating to windblown dust in associations between ...
Introduction: In effect estimates of city-specific PM2.5-mortality associations across United States (US), there exists a substantial amount of spatial heterogeneity. Some of this heterogeneity may be due to mass distribution of PM; areas where PM2.5 is likely to be dominated by large size fractions (above 1 micron; e.g., the contribution of windblown dust), may have a weaker association with mortality. Methods: Log rate ratios (betas) for the PM2.5-mortality association—derived from a model adjusting for time, an interaction with age-group, day of week, and natural splines of current temperature, current dew point, and unconstrained temperature at lags 1, 2, and 3, for 313 core-based statistical areas (CBSA) and their metropolitan divisions (MD) over 1999-2005—were used as the outcome. Using inverse variance weighted linear regression, we examined change in log rate ratios in association with PM10-PM2.5 correlation as a marker of windblown dust/higher PM size fraction; linearity of associations was assessed in models using splines with knots at quintile values. Results: Weighted mean PM2.5 association (0.96 percent increase in total non-accidental mortality for a 10 ug/m3 increment in PM2.5) increased by 0.34 (95% confidence interval: 0.20, 0.48) per interquartile change (0.25) in the PM10-PM2.5 correlation, and explained approximately 8% of the observed heterogeneity; the association was linear based on spline analysis. Conclusions: Preliminary results pro
Analysing Twitter and web queries for flu trend prediction.
Santos, José Carlos; Matos, Sérgio
2014-05-07
Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p<0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.
Morikawa, Go; Suzuka, Chihiro; Shoji, Atsushi; Shibusawa, Yoichi; Yanagida, Akio
2016-01-05
A high-throughput method for determining the octanol/water partition coefficient (P(o/w)) of a large variety of compounds exhibiting a wide range in hydrophobicity was established. The method combines a simple shake-flask method with a novel two-phase solvent system comprising an acetonitrile-phosphate buffer (0.1 M, pH 7.4)-1-octanol (25:25:4, v/v/v; AN system). The AN system partition coefficients (K(AN)) of 51 standard compounds for which log P(o/w) (at pH 7.4; log D) values had been reported were determined by single two-phase partitioning in test tubes, followed by measurement of the solute concentration in both phases using an automatic flow injection-ultraviolet detection system. The log K(AN) values were closely related to reported log D values, and the relationship could be expressed by the following linear regression equation: log D=2.8630 log K(AN) -0.1497(n=51). The relationship reveals that log D values (+8 to -8) for a large variety of highly hydrophobic and/or hydrophilic compounds can be estimated indirectly from the narrow range of log K(AN) values (+3 to -3) determined using the present method. Furthermore, log K(AN) values for highly polar compounds for which no log D values have been reported, such as amino acids, peptides, proteins, nucleosides, and nucleotides, can be estimated using the present method. The wide-ranging log D values (+5.9 to -7.5) of these molecules were estimated for the first time from their log K(AN) values and the above regression equation. Copyright © 2015 Elsevier B.V. All rights reserved.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Arnould, V M-R; Hammami, H; Soyeurt, H; Gengler, N
2010-09-01
Random regression test-day models using Legendre polynomials are commonly used for the estimation of genetic parameters and genetic evaluation for test-day milk production traits. However, some researchers have reported that these models present some undesirable properties such as the overestimation of variances at the edges of lactation. Describing genetic variation of saturated fatty acids expressed in milk fat might require the testing of different models. Therefore, 3 different functions were used and compared to take into account the lactation curve: (1) Legendre polynomials with the same order as currently applied for genetic model for production traits; 2) linear splines with 10 knots; and 3) linear splines with the same 10 knots reduced to 3 parameters. The criteria used were Akaike's information and Bayesian information criteria, percentage square biases, and log-likelihood function. These criteria indentified Legendre polynomials and linear splines with 10 knots reduced to 3 parameters models as the most useful. Reducing more complex models using eigenvalues seemed appealing because the resulting models are less time demanding and can reduce convergence difficulties, because convergence properties also seemed to be improved. Finally, the results showed that the reduced spline model was very similar to the Legendre polynomials model. Copyright (c) 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Daniel A. Yaussy
1989-01-01
Multivariate regression models were developed to predict green board-foot yields (1 board ft. = 2.360 dm 3) for the standard factory lumber grades processed from black cherry (Prunus serotina Ehrh.) and red maple (Acer rubrum L.) factory grade logs sawed at band and circular sawmills. The models use log...
Marrero-Ponce, Yovani; Martínez-Albelo, Eugenio R; Casañola-Martín, Gerardo M; Castillo-Garit, Juan A; Echevería-Díaz, Yunaimy; Zaldivar, Vicente Romero; Tygat, Jan; Borges, José E Rodriguez; García-Domenech, Ramón; Torrens, Francisco; Pérez-Giménez, Facundo
2010-11-01
Novel bond-level molecular descriptors are proposed, based on linear maps similar to the ones defined in algebra theory. The kth edge-adjacency matrix (E(k)) denotes the matrix of bond linear indices (non-stochastic) with regard to canonical basis set. The kth stochastic edge-adjacency matrix, ES(k), is here proposed as a new molecular representation easily calculated from E(k). Then, the kth stochastic bond linear indices are calculated using ES(k) as operators of linear transformations. In both cases, the bond-type formalism is developed. The kth non-stochastic and stochastic total linear indices are calculated by adding the kth non-stochastic and stochastic bond linear indices, respectively, of all bonds in molecule. First, the new bond-based molecular descriptors (MDs) are tested for suitability, for the QSPRs, by analyzing regressions of novel indices for selected physicochemical properties of octane isomers (first round). General performance of the new descriptors in this QSPR studies is evaluated with regard to the well-known sets of 2D/3D MDs. From the analysis, we can conclude that the non-stochastic and stochastic bond-based linear indices have an overall good modeling capability proving their usefulness in QSPR studies. Later, the novel bond-level MDs are also used for the description and prediction of the boiling point of 28 alkyl-alcohols (second round), and to the modeling of the specific rate constant (log k), partition coefficient (log P), as well as the antibacterial activity of 34 derivatives of 2-furylethylenes (third round). The comparison with other approaches (edge- and vertices-based connectivity indices, total and local spectral moments, and quantum chemical descriptors as well as E-state/biomolecular encounter parameters) exposes a good behavior of our method in this QSPR studies. Finally, the approach described in this study appears to be a very promising structural invariant, useful not only for QSPR studies but also for similarity/diversity analysis and drug discovery protocols.
Caraviello, D Z; Weigel, K A; Gianola, D
2004-05-01
Predicted transmitting abilities (PTA) of US Jersey sires for daughter longevity were calculated using a Weibull proportional hazards sire model and compared with predictions from a conventional linear animal model. Culling data from 268,008 Jersey cows with first calving from 1981 to 2000 were used. The proportional hazards model included time-dependent effects of herd-year-season contemporary group and parity by stage of lactation interaction, as well as time-independent effects of sire and age at first calving. Sire variances and parameters of the Weibull distribution were estimated, providing heritability estimates of 4.7% on the log scale and 18.0% on the original scale. The PTA of each sire was expressed as the expected risk of culling relative to daughters of an average sire. Risk ratios (RR) ranged from 0.7 to 1.3, indicating that the risk of culling for daughters of the best sires was 30% lower than for daughters of average sires and nearly 50% lower than than for daughters of the poorest sires. Sire PTA from the proportional hazards model were compared with PTA from a linear model similar to that used for routine national genetic evaluation of length of productive life (PL) using cross-validation in independent samples of herds. Models were compared using logistic regression of daughters' stayability to second, third, fourth, or fifth lactation on their sires' PTA values, with alternative approaches for weighting the contribution of each sire. Models were also compared using logistic regression of daughters' stayability to 36, 48, 60, 72, and 84 mo of life. The proportional hazards model generally yielded more accurate predictions according to these criteria, but differences in predictive ability between methods were smaller when using a Kullback-Leibler distance than with other approaches. Results of this study suggest that survival analysis methodology may provide more accurate predictions of genetic merit for longevity than conventional linear models.
Nirouei, Mahyar; Ghasemi, Ghasem; Abdolmaleki, Parviz; Tavakoli, Abdolreza; Shariati, Shahab
2012-06-01
The antiviral drugs that inhibit human immunodeficiency virus (HIV) entry to the target cells are already in different phases of clinical trials. They prevent viral entry and have a highly specific mechanism of action with a low toxicity profile. Few QSAR studies have been performed on this group of inhibitors. This study was performed to develop a quantitative structure-activity relationship (QSAR) model of the biological activity of indole glyoxamide derivatives as inhibitors of the interaction between HIV glycoprotein gp120 and host cell CD4 receptors. Forty different indole glyoxamide derivatives were selected as a sample set and geometrically optimized using Gaussian 98W. Different combinations of multiple linear regression (MLR), genetic algorithms (GA) and artificial neural networks (ANN) were then utilized to construct the QSAR models. These models were also utilized to select the most efficient subsets of descriptors in a cross-validation procedure for non-linear log (1/EC50) prediction. The results that were obtained using GA-ANN were compared with MLR-MLR and MLR-ANN models. A high predictive ability was observed for the MLR, MLR-ANN and GA-ANN models, with root mean sum square errors (RMSE) of 0.99, 0.91 and 0.67, respectively (N = 40). In summary, machine learning methods were highly effective in designing QSAR models when compared to statistical method.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
An hourly regression model for ultrafine particles in a near-highway urban area
Patton, Allison P.; Collins, Caitlin; Naumova, Elena N.; Zamore, Wig; Brugge, Doug; Durant, John L.
2015-01-01
Estimating ultrafine particle number concentrations (PNC) near highways for exposure assessment in chronic health studies requires models capable of capturing PNC spatial and temporal variations over the course of a full year. The objectives of this work were to describe the relationship between near-highway PNC and potential predictors, and to build and validate hourly log-linear regression models. PNC was measured near Interstate 93 (I-93) in Somerville, MA (USA) using a mobile monitoring platform driven for 234 hours on 43 days between August 2009 and September 2010. Compared to urban background, PNC levels were consistently elevated within 100–200 m of I-93, with gradients impacted by meteorological and traffic conditions. Temporal and spatial variables including wind speed and direction, temperature, highway traffic, and distance to I-93 and major roads contributed significantly to the full regression model. Cross-validated model R2 values ranged from 0.38–0.47, with higher values achieved (0.43–0.53) when short-duration PNC spikes were removed. The model predicts highest PNC near major roads and on cold days with low wind speeds. The model allows estimation of hourly ambient PNC at 20-m resolution in a near-highway neighborhood. PMID:24559198
The association of genetic variants of type 2 diabetes with kidney function.
Franceschini, Nora; Shara, Nawar M; Wang, Hong; Voruganti, V Saroja; Laston, Sandy; Haack, Karin; Lee, Elisa T; Best, Lyle G; Maccluer, Jean W; Cochran, Barbara J; Dyer, Thomas D; Howard, Barbara V; Cole, Shelley A; North, Kari E; Umans, Jason G
2012-07-01
Type 2 diabetes is highly prevalent and is the major cause of progressive chronic kidney disease in American Indians. Genome-wide association studies identified several loci associated with diabetes but their impact on susceptibility to diabetic complications is unknown. We studied the association of 18 type 2 diabetes genome-wide association single-nucleotide polymorphisms (SNPs) with estimated glomerular filtration rate (eGFR; MDRD equation) and urine albumin-to-creatinine ratio in 6958 Strong Heart Study family and cohort participants. Center-specific residuals of eGFR and log urine albumin-to-creatinine ratio, obtained from linear regression models adjusted for age, sex, and body mass index, were regressed onto SNP dosage using variance component models in family data and linear regression in unrelated individuals. Estimates were then combined across centers. Four diabetic loci were associated with eGFR and one locus with urine albumin-to-creatinine ratio. A SNP in the WFS1 gene (rs10010131) was associated with higher eGFR in younger individuals and with increased albuminuria. SNPs in the FTO, KCNJ11, and TCF7L2 genes were associated with lower eGFR, but not albuminuria, and were not significant in prospective analyses. Our findings suggest a shared genetic risk for type 2 diabetes and its kidney complications, and a potential role for WFS1 in early-onset diabetic nephropathy in American Indian populations.
Excess adiposity, inflammation, and iron-deficiency in female adolescents.
Tussing-Humphreys, Lisa M; Liang, Huifang; Nemeth, Elizabeta; Freels, Sally; Braunschweig, Carol A
2009-02-01
Iron deficiency is more prevalent in overweight children and adolescents but the mechanisms that underlie this condition remain unclear. The purpose of this cross-sectional study was to assess the relationship between iron status and excess adiposity, inflammation, menarche, diet, physical activity, and poverty status in female adolescents included in the National Health and Nutrition Examination Survey 2003-2004 dataset. Descriptive and simple comparative statistics (t test, chi(2)) were used to assess differences between normal-weight (5th < or = body mass index [BMI] percentile <85th) and heavier-weight girls (< or = 85th percentile for BMI) for demographic, biochemical, dietary, and physical activity variables. In addition, logistic regression analyses predicting iron deficiency and linear regression predicting serum iron levels were performed. Heavier-weight girls had an increased prevalence of iron deficiency compared to those with normal weight. Dietary iron, age of and time since first menarche, poverty status, and physical activity were similar between the two groups and were not independent predictors of iron deficiency or log serum iron levels. Logistic modeling predicting iron deficiency revealed having a BMI > or = 85th percentile and for each 1 mg/dL increase in C-reactive protein the odds ratio for iron deficiency more than doubled. The best-fit linear model to predict serum iron levels included both serum transferrin receptor and C-reactive protein following log-transformation for normalization of these variables. Findings indicate that heavier-weight female adolescents are at greater risk for iron deficiency and that inflammation stemming from excess adipose tissue contributes to this phenomenon. Food and nutrition professionals should consider elevated BMI as an additional risk factor for iron deficiency in female adolescents.
Spatial Bayesian Latent Factor Regression Modeling of Coordinate-based Meta-analysis Data
Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D.; Nichols, Thomas E.
2017-01-01
Summary Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the paper are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to 1) identify areas of consistent activation; and 2) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterised as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. PMID:28498564
Lawrence, Stephen J.
2012-01-01
Regression analyses show that E. coli density in samples was strongly related to turbidity, streamflow characteristics, and season at both sites. The regression equation chosen for the Norcross data showed that 78 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), streamflow event (dry-weather flow or stormflow), season (cool or warm), and an interaction term that is the cross product of streamflow event and turbidity. The regression equation chosen for the Atlanta data showed that 76 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), water temperature, streamflow event, and an interaction term that is the cross product of streamflow event and turbidity. Residual analysis and model confirmation using new data indicated the regression equations selected at both sites predicted E. coli density within the 90 percent prediction intervals of the equations and could be used to predict E. coli density in real time at both sites.
The relative toxic response of 27 selected phenols in the 96-hr acute flowthrough Pimephales promelas (fathead minnow) and the 48- to 60-hr chronic static Tetrahymena pyriformis (ciliate protozoan) test systems was evaluated. Log Kow-dependent linear regression analyses revealed ...
Microbial Transformation of Esters of Chlorinated Carboxylic Acids
Paris, D. F.; Wolfe, N. L.; Steen, W. C.
1984-01-01
Two groups of compounds were selected for microbial transformation studies. In the first group were carboxylic acid esters having a fixed aromatic moiety and an increasing length of the alkyl component. Ethyl esters of chlorine-substituted carboxylic acids were in the second group. Microorganisms from environmental waters and a pure culture of Pseudomonas putida U were used. The bacterial populations were monitored by plate counts, and disappearance of the parent compound was followed by gas-liquid chromatography as a function of time. The products of microbial hydrolysis were the respective carboxylic acids. Octanol-water partition coefficients (Kow) for the compounds were measured. These values spanned three orders of magnitude, whereas microbial transformation rate constants (kb) varied only 50-fold. The microbial rate constants of the carboxylic acid esters with a fixed aromatic moiety increased with an increasing length of alkyl substituents. The regression coefficient for the linear relationships between log kb and log Kow was high for group 1 compounds, indicating that these parameters correlated well. The regression coefficient for the linear relationships for group 2 compounds, however, was low, indicating that these parameters correlated poorly. PMID:16346459
NASA Astrophysics Data System (ADS)
Alam, N. M.; Sharma, G. C.; Moreira, Elsa; Jana, C.; Mishra, P. K.; Sharma, N. K.; Mandal, D.
2017-08-01
Markov chain and 3-dimensional log-linear models were attempted to model drought class transitions derived from the newly developed drought index the Standardized Precipitation Evapotranspiration Index (SPEI) at a 12 month time scale for six major drought prone areas of India. Log-linear modelling approach has been used to investigate differences relative to drought class transitions using SPEI-12 time series derived form 48 yeas monthly rainfall and temperature data. In this study, the probabilities of drought class transition, the mean residence time, the 1, 2 or 3 months ahead prediction of average transition time between drought classes and the drought severity class have been derived. Seasonality of precipitation has been derived for non-homogeneous Markov chains which could be used to explain the effect of the potential retreat of drought. Quasi-association and Quasi-symmetry log-linear models have been fitted to the drought class transitions derived from SPEI-12 time series. The estimates of odds along with their confidence intervals were obtained to explain the progression of drought and estimation of drought class transition probabilities. For initial months as the drought severity increases the calculated odds shows lower value and the odds decreases for the succeeding months. This indicates that the ratio of expected frequencies of occurrence of transition from drought class to the non-drought class decreases as compared to transition to any drought class when the drought severity of the present class increases. From 3-dimensional log-linear model it is clear that during the last 24 years the drought probability has increased for almost all the six regions. The findings from the present study will immensely help to assess the impact of drought on the gross primary production and to develop future contingent planning in similar regions worldwide.
Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T
2017-07-01
Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
Fatigue shifts and scatters heart rate variability in elite endurance athletes.
Schmitt, Laurent; Regnard, Jacques; Desmarets, Maxime; Mauny, Fréderic; Mourot, Laurent; Fouillot, Jean-Pierre; Coulmy, Nicolas; Millet, Grégoire
2013-01-01
This longitudinal study aimed at comparing heart rate variability (HRV) in elite athletes identified either in 'fatigue' or in 'no-fatigue' state in 'real life' conditions. 57 elite Nordic-skiers were surveyed over 4 years. R-R intervals were recorded supine (SU) and standing (ST). A fatigue state was quoted with a validated questionnaire. A multilevel linear regression model was used to analyze relationships between heart rate (HR) and HRV descriptors [total spectral power (TP), power in low (LF) and high frequency (HF) ranges expressed in ms(2) and normalized units (nu)] and the status without and with fatigue. The variables not distributed normally were transformed by taking their common logarithm (log10). 172 trials were identified as in a 'fatigue' and 891 as in 'no-fatigue' state. All supine HR and HRV parameters (Beta±SE) were significantly different (P<0.0001) between 'fatigue' and 'no-fatigue': HRSU (+6.27±0.61 bpm), logTPSU (-0.36±0.04), logLFSU (-0.27±0.04), logHFSU (-0.46±0.05), logLF/HFSU (+0.19±0.03), HFSU(nu) (-9.55±1.33). Differences were also significant (P<0.0001) in standing: HRST (+8.83±0.89), logTPST (-0.28±0.03), logLFST (-0.29±0.03), logHFST (-0.32±0.04). Also, intra-individual variance of HRV parameters was larger (P<0.05) in the 'fatigue' state (logTPSU: 0.26 vs. 0.07, logLFSU: 0.28 vs. 0.11, logHFSU: 0.32 vs. 0.08, logTPST: 0.13 vs. 0.07, logLFST: 0.16 vs. 0.07, logHFST: 0.25 vs. 0.14). HRV was significantly lower in 'fatigue' vs. 'no-fatigue' but accompanied with larger intra-individual variance of HRV parameters in 'fatigue'. The broader intra-individual variance of HRV parameters might encompass different changes from no-fatigue state, possibly reflecting different fatigue-induced alterations of HRV pattern.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Effect of Stress Corrosion and Cyclic Fatigue on Fluorapatite Glass-Ceramic
NASA Astrophysics Data System (ADS)
Joshi, Gaurav V.
2011-12-01
Objective: The objective of this study was to test the following hypotheses: 1. Both cyclic degradation and stress corrosion mechanisms result in subcritical crack growth in a fluorapatite glass-ceramic. 2. There is an interactive effect of stress corrosion and cyclic fatigue to cause subcritical crack growth (SCG) for this material. 3. The material that exhibits rising toughness curve (R-curve) behavior also exhibits a cyclic degradation mechanism. Materials and Methods: The material tested was a fluorapatite glass-ceramic (IPS e.max ZirPress, Ivoclar-Vivadent). Rectangular beam specimens with dimensions of 25 mm x 4 mm x 1.2 mm were fabricated using the press-on technique. Two groups of specimens (N=30) with polished (15 mum) or air abraded surface were tested under rapid monotonic loading. Additional polished specimens were subjected to cyclic loading at two frequencies, 2 Hz (N=44) and 10 Hz (N=36), and at different stress amplitudes. All tests were performed using a fully articulating four-point flexure fixture in deionized water at 37°C. The SCG parameters were determined by using a statistical approach by Munz and Fett (1999). The fatigue lifetime data were fit to a general log-linear model in ALTA PRO software (Reliasoft). Fractographic techniques were used to determine the critical flaw sizes to estimate fracture toughness. To determine the presence of R-curve behavior, non-linear regression was used. Results: Increasing the frequency of cycling did not cause a significant decrease in lifetime. The parameters of the general log-linear model showed that only stress corrosion has a significant effect on lifetime. The parameters are presented in the following table.* SCG parameters (n=19--21) were similar for both frequencies. The regression model showed that the fracture toughness was significantly dependent (p<0.05) on critical flaw size. Conclusions: 1. Cyclic fatigue does not have a significant effect on the SCG in the fluorapatite glass-ceramic IPS e.max ZirPress. 2. There was no interactive effect between cyclic degradation and stress corrosion for this material. 3. The material exhibited a low level of R-curve behavior. It did not exhibit cyclic degradation. *Please refer to dissertation for table.
Dai, James Y.; Chan, Kwun Chuen Gary; Hsu, Li
2014-01-01
Instrumental variable regression is one way to overcome unmeasured confounding and estimate causal effect in observational studies. Built on structural mean models, there has been considerale work recently developed for consistent estimation of causal relative risk and causal odds ratio. Such models can sometimes suffer from identification issues for weak instruments. This hampered the applicability of Mendelian randomization analysis in genetic epidemiology. When there are multiple genetic variants available as instrumental variables, and causal effect is defined in a generalized linear model in the presence of unmeasured confounders, we propose to test concordance between instrumental variable effects on the intermediate exposure and instrumental variable effects on the disease outcome, as a means to test the causal effect. We show that a class of generalized least squares estimators provide valid and consistent tests of causality. For causal effect of a continuous exposure on a dichotomous outcome in logistic models, the proposed estimators are shown to be asymptotically conservative. When the disease outcome is rare, such estimators are consistent due to the log-linear approximation of the logistic function. Optimality of such estimators relative to the well-known two-stage least squares estimator and the double-logistic structural mean model is further discussed. PMID:24863158
Barth, Nancy A.; Veilleux, Andrea G.
2012-01-01
The U.S. Geological Survey (USGS) is currently updating at-site flood frequency estimates for USGS streamflow-gaging stations in the desert region of California. The at-site flood-frequency analysis is complicated by short record lengths (less than 20 years is common) and numerous zero flows/low outliers at many sites. Estimates of the three parameters (mean, standard deviation, and skew) required for fitting the log Pearson Type 3 (LP3) distribution are likely to be highly unreliable based on the limited and heavily censored at-site data. In a generalization of the recommendations in Bulletin 17B, a regional analysis was used to develop regional estimates of all three parameters (mean, standard deviation, and skew) of the LP3 distribution. A regional skew value of zero from a previously published report was used with a new estimated mean squared error (MSE) of 0.20. A weighted least squares (WLS) regression method was used to develop both a regional standard deviation and a mean model based on annual peak-discharge data for 33 USGS stations throughout California’s desert region. At-site standard deviation and mean values were determined by using an expected moments algorithm (EMA) method for fitting the LP3 distribution to the logarithms of annual peak-discharge data. Additionally, a multiple Grubbs-Beck (MGB) test, a generalization of the test recommended in Bulletin 17B, was used for detecting multiple potentially influential low outliers in a flood series. The WLS regression found that no basin characteristics could explain the variability of standard deviation. Consequently, a constant regional standard deviation model was selected, resulting in a log-space value of 0.91 with a MSE of 0.03 log units. Yet drainage area was found to be statistically significant at explaining the site-to-site variability in mean. The linear WLS regional mean model based on drainage area had a Pseudo- 2 R of 51 percent and a MSE of 0.32 log units. The regional parameter estimates were then used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins. The final equations are functions of drainage area.Average standard errors of prediction for these regression equations range from 214.2 to 856.2 percent.
The mathematical formulation of a generalized Hooke's law for blood vessels.
Zhang, Wei; Wang, Chong; Kassab, Ghassan S
2007-08-01
It is well known that the stress-strain relationship of blood vessels is highly nonlinear. To linearize the relationship, the Hencky strain tensor is generalized to a logarithmic-exponential (log-exp) strain tensor to absorb the nonlinearity. A quadratic nominal strain potential is proposed to derive the second Piola-Kirchhoff stresses by differentiating the potential with respect to the log-exp strains. The resulting constitutive equation is a generalized Hooke's law. Ten material constants are needed for the three-dimensional orthotropic model. The nondimensional constant used in the log-exp strain definition is interpreted as a nonlinearity parameter. The other nine constants are the elastic moduli with respect to the log-exp strains. In this paper, the proposed linear stress-strain relation is shown to represent the pseudoelastic Fung model very well.
Salmonella Inactivation During Extrusion of an Oat Flour Model Food.
Anderson, Nathan M; Keller, Susanne E; Mishra, Niharika; Pickens, Shannon; Gradl, Dana; Hartter, Tim; Rokey, Galen; Dohl, Christopher; Plattner, Brian; Chirtel, Stuart; Grasso-Kelley, Elizabeth M
2017-03-01
Little research exists on Salmonella inactivation during extrusion processing, yet many outbreaks associated with low water activity foods since 2006 were linked to extruded foods. The aim of this research was to study Salmonella inactivation during extrusion of a model cereal product. Oat flour was inoculated with Salmonella enterica serovar Agona, an outbreak strain isolated from puffed cereals, and processed using a single-screw extruder at a feed rate of 75 kg/h and a screw speed of 500 rpm. Extrudate samples were collected from the barrel outlet in sterile bags and immediately cooled in an ice-water bath. Populations were determined using standard plate count methods or a modified most probable number when populations were low. Reductions in population were determined and analyzed using a general linear model. The regression model obtained for the response surface tested was Log (N R /N O ) = 20.50 + 0.82T - 141.16a w - 0.0039T 2 + 87.91a w 2 (R 2 = 0.69). The model showed significant (p < 0.05) linear and quadratic effects of a w and temperature and enabled an assessment of critical control parameters. Reductions of 0.67 ± 0.14 to 7.34 ± 0.02 log CFU/g were observed over ranges of a w (0.72 to 0.96) and temperature (65 to 100 °C) tested. Processing conditions above 82 °C and 0.89 a w achieved on average greater than a 5-log reduction of Salmonella. Results indicate that extrusion is an effective means for reducing Salmonella as most processes commonly employed to produce cereals and other low water activity foods exceed these parameters. Thus, contamination of an extruded food product would most likely occur postprocessing as a result of environmental contamination or through the addition of coatings and flavorings. © 2017 Institute of Food Technologists®.
Cronin, Matthew A.; Amstrup, Steven C.; Durner, George M.; Noel, Lynn E.; McDonald, Trent L.; Ballard, Warren B.
1998-01-01
There is concern that caribou (Rangifer tarandus) may avoid roads and facilities (i.e., infrastructure) in the Prudhoe Bay oil field (PBOF) in northern Alaska, and that this avoidance can have negative effects on the animals. We quantified the relationship between caribou distribution and PBOF infrastructure during the post-calving period (mid-June to mid-August) with aerial surveys from 1990 to 1995. We conducted four to eight surveys per year with complete coverage of the PBOF. We identified active oil field infrastructure and used a geographic information system (GIS) to construct ten 1 km wide concentric intervals surrounding the infrastructure. We tested whether caribou distribution is related to distance from infrastructure with a chi-squared habitat utilization-availability analysis and log-linear regression. We considered bulls, calves, and total caribou of all sex/age classes separately. The habitat utilization-availability analysis indicated there was no consistent trend of attraction to or avoidance of infrastructure. Caribou frequently were more abundant than expected in the intervals close to infrastructure, and this trend was more pronounced for bulls and for total caribou of all sex/age classes than for calves. Log-linear regression (with Poisson error structure) of numbers of caribou and distance from infrastructure were also done, with and without combining data into the 1 km distance intervals. The analysis without intervals revealed no relationship between caribou distribution and distance from oil field infrastructure, or between caribou distribution and Julian date, year, or distance from the Beaufort Sea coast. The log-linear regression with caribou combined into distance intervals showed the density of bulls and total caribou of all sex/age classes declined with distance from infrastructure. Our results indicate that during the post-calving period: 1) caribou distribution is largely unrelated to distance from infrastructure; 2) caribou regularly use habitats in the PBOF; 3) caribou often occur close to infrastructure; and 4) caribou do not appear to avoid oil field infrastructure.
The word frequency effect during sentence reading: A linear or nonlinear effect of log frequency?
White, Sarah J; Drieghe, Denis; Liversedge, Simon P; Staub, Adrian
2016-10-20
The effect of word frequency on eye movement behaviour during reading has been reported in many experimental studies. However, the vast majority of these studies compared only two levels of word frequency (high and low). Here we assess whether the effect of log word frequency on eye movement measures is linear, in an experiment in which a critical target word in each sentence was at one of three approximately equally spaced log frequency levels. Separate analyses treated log frequency as a categorical or a continuous predictor. Both analyses showed only a linear effect of log frequency on the likelihood of skipping a word, and on first fixation duration. Ex-Gaussian analyses of first fixation duration showed similar effects on distributional parameters in comparing high- and medium-frequency words, and medium- and low-frequency words. Analyses of gaze duration and the probability of a refixation suggested a nonlinear pattern, with a larger effect at the lower end of the log frequency scale. However, the nonlinear effects were small, and Bayes Factor analyses favoured the simpler linear models for all measures. The possible roles of lexical and post-lexical factors in producing nonlinear effects of log word frequency during sentence reading are discussed.
A Linearized Model for Flicker and Contrast Thresholds at Various Retinal Illuminances
NASA Technical Reports Server (NTRS)
Ahumada, Albert; Watson, Andrew
2015-01-01
We previously proposed a flicker visibility metric for bright displays, based on psychophysical data collected at a high mean luminance. Here we extend the metric to other mean luminances. This extension relies on a linear relation between log sensitivity and critical fusion frequency, and a linear relation between critical fusion frequency and log retina lilluminance. Consistent with our previous metric, the extended flicker visibility metric is measured in just-noticeable differences (JNDs).
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Log-Linear Modeling of Agreement among Expert Exposure Assessors
Hunt, Phillip R.; Friesen, Melissa C.; Sama, Susan; Ryan, Louise; Milton, Donald
2015-01-01
Background: Evaluation of expert assessment of exposure depends, in the absence of a validation measurement, upon measures of agreement among the expert raters. Agreement is typically measured using Cohen’s Kappa statistic, however, there are some well-known limitations to this approach. We demonstrate an alternate method that uses log-linear models designed to model agreement. These models contain parameters that distinguish between exact agreement (diagonals of agreement matrix) and non-exact associations (off-diagonals). In addition, they can incorporate covariates to examine whether agreement differs across strata. Methods: We applied these models to evaluate agreement among expert ratings of exposure to sensitizers (none, likely, high) in a study of occupational asthma. Results: Traditional analyses using weighted kappa suggested potential differences in agreement by blue/white collar jobs and office/non-office jobs, but not case/control status. However, the evaluation of the covariates and their interaction terms in log-linear models found no differences in agreement with these covariates and provided evidence that the differences observed using kappa were the result of marginal differences in the distribution of ratings rather than differences in agreement. Differences in agreement were predicted across the exposure scale, with the likely moderately exposed category more difficult for the experts to differentiate from the highly exposed category than from the unexposed category. Conclusions: The log-linear models provided valuable information about patterns of agreement and the structure of the data that were not revealed in analyses using kappa. The models’ lack of dependence on marginal distributions and the ease of evaluating covariates allow reliable detection of observational bias in exposure data. PMID:25748517
Zheng, Han; Kimber, Alan; Goodwin, Victoria A; Pickering, Ruth M
2018-01-01
A common design for a falls prevention trial is to assess falling at baseline, randomize participants into an intervention or control group, and ask them to record the number of falls they experience during a follow-up period of time. This paper addresses how best to include the baseline count in the analysis of the follow-up count of falls in negative binomial (NB) regression. We examine the performance of various approaches in simulated datasets where both counts are generated from a mixed Poisson distribution with shared random subject effect. Including the baseline count after log-transformation as a regressor in NB regression (NB-logged) or as an offset (NB-offset) resulted in greater power than including the untransformed baseline count (NB-unlogged). Cook and Wei's conditional negative binomial (CNB) model replicates the underlying process generating the data. In our motivating dataset, a statistically significant intervention effect resulted from the NB-logged, NB-offset, and CNB models, but not from NB-unlogged, and large, outlying baseline counts were overly influential in NB-unlogged but not in NB-logged. We conclude that there is little to lose by including the log-transformed baseline count in standard NB regression compared to CNB for moderate to larger sized datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Schmidt, Rebecca J; Hansen, Robin L; Hartiala, Jaana; Allayee, Hooman; Sconberg, Jaime L; Schmidt, Linda C; Volk, Heather E; Tassone, Flora
2015-08-01
Vitamin D is essential for proper neurodevelopment and cognitive and behavioral function. We examined associations between autism spectrum disorder (ASD) and common, functional polymorphisms in vitamin D pathways. Children aged 24-60 months enrolled from 2003 to 2009 in the population-based CHARGE case-control study were evaluated clinically and confirmed to have ASD (n=474) or typical development (TD, n=281). Maternal, paternal, and child DNA samples for 384 (81%) families of children with ASD and 234 (83%) families of TD children were genotyped for: TaqI, BsmI, FokI, and Cdx2 in the vitamin D receptor (VDR) gene, and CYP27B1 rs4646536, GC rs4588, and CYP2R1 rs10741657. Case-control logistic regression, family-based log-linear, and hybrid log-linear analyses were conducted to produce risk estimates and 95% confidence intervals (CI) for each allelic variant. Paternal VDR TaqI homozygous variant genotype was significantly associated with ASD in case-control analysis (odds ratio [OR] [CI]: 6.3 [1.9-20.7]) and there was a trend towards increased risk associated with VDR BsmI (OR [CI]: 4.7 [1.6-13.4]). Log-linear triad analyses detected parental imprinting, with greater effects of paternally-derived VDR alleles. Child GC AA-genotype/A-allele was associated with ASD in log-linear and ETDT analyses. A significant association between decreased ASD risk and child CYP2R1 AA-genotype was found in hybrid log-linear analysis. There were limitations of low statistical power for less common alleles due to missing paternal genotypes. This study provides preliminary evidence that paternal and child vitamin D metabolism could play a role in the etiology of ASD; further research in larger study populations is warranted. Copyright © 2015. Published by Elsevier Ireland Ltd.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Franchignoni, F; Tesio, L; Martino, M T; Benevolo, E; Castagna, M
1998-01-01
A model for prediction of length of stay (LOS, in days) of stroke rehabilitation inpatients was developed, based on patients' age (years) and function at admission (scored on the Functional Independence Measure, FIMSM). One hundred and twenty-nine cases, consecutively admitted to three free-standing rehabilitation centres in Italy, were analyzed. A multiple linear regression using forward stepwise selection procedure was adopted. Median admission and discharge scores were: 57 and 75 for the total FIM score, 29 and 48 for the 13-item motor FIM subscore, 29 and 30 for the 5-item cognitive FIM subscore (potential range: 18-126, 13-91, 5-35, respectively). Median LOS was 44 days (interquartile range 30-62). The logLOS predictive model included three FIM items ("toilet transfer", TTr; "social interaction"; "expression") and patient's age (R2 = 0.48). TTr alone explained 31.3% of the variance of logLOS. These results are consistent with previous American studies, showing that FIM scores at admission are strong predictors of patients' LOS, with the transfer items having the greatest predictive power.
Unit Price Scaling Trends for Chemical Products
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qi, Wei; Sathre, Roger; William R. Morrow, III
2015-08-01
To facilitate early-stage life-cycle techno-economic modeling of emerging technologies, here we identify scaling relations between unit price and sales quantity for a variety of chemical products of three categories - metal salts, organic compounds, and solvents. We collect price quotations for lab-scale and bulk purchases of chemicals from both U.S. and Chinese suppliers. We apply a log-log linear regression model to estimate the price discount effect. Using the median discount factor of each category, one can infer bulk prices of products for which only lab-scale prices are available. We conduct out-of-sample tests showing that most of the price proxies deviatemore » from their actual reference prices by a factor less than ten. We also apply the bootstrap method to determine if a sample median discount factor should be accepted for price approximation. We find that appropriate discount factors for metal salts and for solvents are both -0.56, while that for organic compounds is -0.67 and is less representative due to greater extent of product heterogeneity within this category.« less
The role of NT-proBNP in explaining the variance in anaerobic threshold and VE/VCO(2) slope.
Athanasopoulos, Leonidas V; Dritsas, Athanasios; Doll, Helen A; Cokkinos, Dennis V
2011-01-01
We investigated whether anaerobic threshold (AT) and ventilatory efficiency (minute ventilation/carbon dioxide production slope, VE/VCO2 slope), both significantly associated with mortality, can be predicted by questionnaire scores and/or other laboratory measurements. Anaerobic threshold and VE/VCO(2) slope, plasma N-terminal pro-brain natriuretic peptide (NT-proBNP), and the echocardiographic markers left ventricular ejection fraction (LVEF) and left atrial (LA) diameter were measured in 62 patients with heart failure (HF), who also completed the Minnesota Living with Heart Failure Questionnaire (MLHF), and the Specific Activity Questionnaire (SAQ). Linear regression models, adjusting for age and gender, were fitted. While the etiology of HF, SAQ score, MLHF score, LVEF, LA diameter, and logNT-proBNP were each significantly predictive of both AT and VE/VCO2 slope on stepwise multiple linear regression, only SAQ score (P < .001) and logNT-proBNP (P = .001) were significantly predictive of AT, explaining 56% of the variability (adjusted R(2) = 0.525), while logNT-proBNP (P < .001) and etiology of HF (P = .003) were significantly predictive of VE/VCO(2) slope, explaining 49% of the variability (adjusted R(2) = 0.45). The area under the ROC curve for NT-proBNP to identify patients with a VE/VCO(2) slope greater than 34 and AT less than 11 mL · kg(-1) · min(-1) was 0.797; P < .001 and 0.712; P = .044, respectively. A plasma concentration greater than 429.5 pg/mL (sensitivity: 78%; specificity: 70%) and greater than 674.5 pg/mL (sensitivity: 77.8%; specificity: 65%) identified a VE/VCO(2) slope greater than 34 and AT lower than 11 mL · kg(-1) · min(-1), respectively. NT-proBNP is independently related to both AT and VE/VCO(2) slope. Specific Activity Questionnaire score is independently related only to AT and the etiology of HF only to VE/VCO(2) slope.
Matilla-Santander, Nuria; Valvi, Damaskini; Lopez-Espinosa, Maria-Jose; Manzano-Salgado, Cyntia B.; Ballester, Ferran; Ibarluzea, Jesús; Santa-Marina, Loreto; Schettgen, Thomas; Guxens, Mònica; Sunyer, Jordi
2017-01-01
Background: Exposure to perfluoroalkyl substances (PFASs) may increase risk for metabolic diseases; however, epidemiologic evidence is lacking at the present time. Pregnancy is a period of enhanced tissue plasticity for the fetus and the mother and may be a critical window of PFAS exposure susceptibility. Objective: We evaluated the associations between PFAS exposures and metabolic outcomes in pregnant women. Methods: We analyzed 1,240 pregnant women from the Spanish INMA [Environment and Childhood Project (INfancia y Medio Ambiente)] birth cohort study (recruitment period: 2003–2008) with measured first pregnancy trimester plasma concentrations of four PFASs (in nanograms/milliliter). We used logistic regression models to estimate associations of PFASs (log10-transformed and categorized into quartiles) with impaired glucose tolerance (IGT) and gestational diabetes mellitus (GDM), and we used linear regression models to estimate associations with first-trimester serum levels of triglycerides, total cholesterol, and C-reactive protein (CRP). Results: Perfluorooctane sulfonate (PFOS) and perfluorohexane sulfonate (PFHxS) were positively associated with IGT (137 cases) [OR per log10-unit increase=1.99 (95% CI: 1.06, 3.78) and OR=1.65 ( 95% CI: 0.99, 2.76), respectively]. PFOS and PFHxS associations with GDM (53 cases) were in a similar direction, but less precise. PFOS and perfluorononanoate (PFNA) were negatively associated with triglyceride levels [percent median change per log10-unit increase=−5.86% (95% CI: −9.91%, −1.63%) and percent median change per log10-unit increase=−4.75% (95% CI: −8.16%, −0.61%, respectively], whereas perfluorooctanoate (PFOA) was positively associated with total cholesterol [percent median change per log10-unit increase=1.26% (95% CI: 0.01%, 2.54%)]. PFASs were not associated with CRP in the subset of the population with available data (n=640). Conclusions: Although further confirmation is required, the findings from this study suggest that PFAS exposures during pregnancy may influence lipid metabolism and glucose tolerance and thus may impact the health of the mother and her child. https://doi.org/10.1289/EHP1062 PMID:29135438
Matilla-Santander, Nuria; Valvi, Damaskini; Lopez-Espinosa, Maria-Jose; Manzano-Salgado, Cyntia B; Ballester, Ferran; Ibarluzea, Jesús; Santa-Marina, Loreto; Schettgen, Thomas; Guxens, Mònica; Sunyer, Jordi; Vrijheid, Martine
2017-11-13
Exposure to perfluoroalkyl substances (PFASs) may increase risk for metabolic diseases; however, epidemiologic evidence is lacking at the present time. Pregnancy is a period of enhanced tissue plasticity for the fetus and the mother and may be a critical window of PFAS exposure susceptibility. We evaluated the associations between PFAS exposures and metabolic outcomes in pregnant women. We analyzed 1,240 pregnant women from the Spanish INMA [Environment and Childhood Project (INfancia y Medio Ambiente)] birth cohort study (recruitment period: 2003-2008) with measured first pregnancy trimester plasma concentrations of four PFASs (in nanograms/milliliter). We used logistic regression models to estimate associations of PFASs (log 10 -transformed and categorized into quartiles) with impaired glucose tolerance (IGT) and gestational diabetes mellitus (GDM), and we used linear regression models to estimate associations with first-trimester serum levels of triglycerides, total cholesterol, and C-reactive protein (CRP). Perfluorooctane sulfonate (PFOS) and perfluorohexane sulfonate (PFHxS) were positively associated with IGT (137 cases) [OR per log 10 -unit increase=1.99 (95% CI: 1.06, 3.78) and OR=1.65 ( 95% CI: 0.99, 2.76), respectively]. PFOS and PFHxS associations with GDM (53 cases) were in a similar direction, but less precise. PFOS and perfluorononanoate (PFNA) were negatively associated with triglyceride levels [percent median change per log 10 -unit increase=-5.86% (95% CI: -9.91%, -1.63%) and percent median change per log 10 -unit increase=-4.75% (95% CI: -8.16%, -0.61%, respectively], whereas perfluorooctanoate (PFOA) was positively associated with total cholesterol [percent median change per log 10 -unit increase=1.26% (95% CI: 0.01%, 2.54%)]. PFASs were not associated with CRP in the subset of the population with available data ( n =640). Although further confirmation is required, the findings from this study suggest that PFAS exposures during pregnancy may influence lipid metabolism and glucose tolerance and thus may impact the health of the mother and her child. https://doi.org/10.1289/EHP1062.
Nistal-Nuño, Beatriz
2017-03-31
In Chile, a new law introduced in March 2012 lowered the blood alcohol concentration (BAC) limit for impaired drivers from 0.1% to 0.08% and the BAC limit for driving under the influence of alcohol from 0.05% to 0.03%, but its effectiveness remains uncertain. The goal of this investigation was to evaluate the effects of this enactment on road traffic injuries and fatalities in Chile. A retrospective cohort study. Data were analyzed using a descriptive and a Generalized Linear Models approach, type of Poisson regression, to analyze deaths and injuries in a series of additive Log-Linear Models accounting for the effects of law implementation, month influence, a linear time trend and population exposure. A review of national databases in Chile was conducted from 2003 to 2014 to evaluate the monthly rates of traffic fatalities and injuries associated to alcohol and in total. It was observed a decrease by 28.1 percent in the monthly rate of traffic fatalities related to alcohol as compared to before the law (P<0.001). Adding a linear time trend as a predictor, the decrease was by 20.9 percent (P<0.001).There was a reduction in the monthly rate of traffic injuries related to alcohol by 10.5 percent as compared to before the law (P<0.001). Adding a linear time trend as a predictor, the decrease was by 24.8 percent (P<0.001). Positive results followed from this new 'zero-tolerance' law implemented in 2012 in Chile. Chile experienced a significant reduction in alcohol-related traffic fatalities and injuries, being a successful public health intervention.
Palanichamy, A; Jayas, D S; Holley, R A
2008-01-01
The Canadian Food Inspection Agency required the meat industry to ensure Escherichia coli O157:H7 does not survive (experiences > or = 5 log CFU/g reduction) in dry fermented sausage (salami) during processing after a series of foodborne illness outbreaks resulting from this pathogenic bacterium occurred. The industry is in need of an effective technique like predictive modeling for estimating bacterial viability, because traditional microbiological enumeration is a time-consuming and laborious method. The accuracy and speed of artificial neural networks (ANNs) for this purpose is an attractive alternative (developed from predictive microbiology), especially for on-line processing in industry. Data from a study of interactive effects of different levels of pH, water activity, and the concentrations of allyl isothiocyanate at various times during sausage manufacture in reducing numbers of E. coli O157:H7 were collected. Data were used to develop predictive models using a general regression neural network (GRNN), a form of ANN, and a statistical linear polynomial regression technique. Both models were compared for their predictive error, using various statistical indices. GRNN predictions for training and test data sets had less serious errors when compared with the statistical model predictions. GRNN models were better and slightly better for training and test sets, respectively, than was the statistical model. Also, GRNN accurately predicted the level of allyl isothiocyanate required, ensuring a 5-log reduction, when an appropriate production set was created by interpolation. Because they are simple to generate, fast, and accurate, ANN models may be of value for industrial use in dry fermented sausage manufacture to reduce the hazard associated with E. coli O157:H7 in fresh beef and permit production of consistently safe products from this raw material.
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Developing and applying metamodels of high resolution ...
As defined by Wikipedia (https://en.wikipedia.org/wiki/Metamodeling), “(a) metamodel or surrogate model is a model of a model, and metamodeling is the process of generating such metamodels.” The goals of metamodeling include, but are not limited to (1) developing functional or statistical relationships between a model’s input and output variables for model analysis, interpretation, or information consumption by users’ clients; (2) quantifying a model’s sensitivity to alternative or uncertain forcing functions, initial conditions, or parameters; and (3) characterizing the model’s response or state space. Using five existing models developed by US Environmental Protection Agency, we generate a metamodeling database of the expected environmental and biological concentrations of 644 organic chemicals released into nine US rivers from wastewater treatment works (WTWs) assuming multiple loading rates and sizes of populations serviced. The chemicals of interest have log n-octanol/water partition coefficients ( ) ranging from 3 to 14, and the rivers of concern have mean annual discharges ranging from 1.09 to 3240 m3/s. Log linear regression models are derived to predict mean annual dissolved and total water concentrations and total sediment concentrations of chemicals of concern based on their , Henry’s Law Constant, and WTW loading rate and on the mean annual discharges of the receiving rivers. Metamodels are also derived to predict mean annual chemical
NASA Astrophysics Data System (ADS)
Farahi, Arya; Evrard, August E.; McCarthy, Ian; Barnes, David J.; Kay, Scott T.
2018-05-01
Using tens of thousands of halos realized in the BAHAMAS and MACSIS simulations produced with a consistent astrophysics treatment that includes AGN feedback, we validate a multi-property statistical model for the stellar and hot gas mass behavior in halos hosting groups and clusters of galaxies. The large sample size allows us to extract fine-scale mass-property relations (MPRs) by performing local linear regression (LLR) on individual halo stellar mass (Mstar) and hot gas mass (Mgas) as a function of total halo mass (Mhalo). We find that: 1) both the local slope and variance of the MPRs run with mass (primarily) and redshift (secondarily); 2) the conditional likelihood, p(Mstar, Mgas| Mhalo, z) is accurately described by a multivariate, log-normal distribution, and; 3) the covariance of Mstar and Mgas at fixed Mhalo is generally negative, reflecting a partially closed baryon box model for high mass halos. We validate the analytical population model of Evrard et al. (2014), finding sub-percent accuracy in the log-mean halo mass selected at fixed property, ⟨ln Mhalo|Mgas⟩ or ⟨ln Mhalo|Mstar⟩, when scale-dependent MPR parameters are employed. This work highlights the potential importance of allowing for running in the slope and scatter of MPRs when modeling cluster counts for cosmological studies. We tabulate LLR fit parameters as a function of halo mass at z = 0, 0.5 and 1 for two popular mass conventions.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Adsorptive removal of pharmaceuticals from water by commercial and waste-based carbons.
Calisto, Vânia; Ferreira, Catarina I A; Oliveira, João A B P; Otero, Marta; Esteves, Valdemar I
2015-04-01
This work describes the single adsorption of seven pharmaceuticals (carbamazepine, oxazepam, sulfamethoxazole, piroxicam, cetirizine, venlafaxine and paroxetine) from water onto a commercially available activated carbon and a non-activated carbon produced by pyrolysis of primary paper mill sludge. Kinetics and equilibrium adsorption studies were performed using a batch experimental approach. For all pharmaceuticals, both carbons presented fast kinetics (equilibrium times varying from less than 5 min to 120 min), mainly described by a pseudo-second order model. Equilibrium data were appropriately described by the Langmuir and Freundlich isotherm models, the last one giving slightly higher correlation coefficients. The fitted parameters obtained for both models were quite different for the seven pharmaceuticals under study. In order to evaluate the influence of water solubility, log Kow, pKa, polar surface area and number of hydrogen bond acceptors of pharmaceuticals on the adsorption parameters, multiple linear regression analysis was performed. The variability is mainly due to log Kow followed by water solubility, in the case of the waste-based carbon, and due to water solubility in the case of the commercial activated carbon. Copyright © 2015 Elsevier Ltd. All rights reserved.
Daily magnesium intake and serum magnesium concentration among Japanese people.
Akizawa, Yoriko; Koizumi, Sadayuki; Itokawa, Yoshinori; Ojima, Toshiyuki; Nakamura, Yosikazu; Tamura, Tarou; Kusaka, Yukinori
2008-01-01
The vitamins and minerals that are deficient in the daily diet of a normal adult remain unknown. To answer this question, we conducted a population survey focusing on the relationship between dietary magnesium intake and serum magnesium level. The subjects were 62 individuals from Fukui Prefecture who participated in the 1998 National Nutrition Survey. The survey investigated the physical status, nutritional status, and dietary data of the subjects. Holidays and special occasions were avoided, and a day when people are most likely to be on an ordinary diet was selected as the survey date. The mean (+/-standard deviation) daily magnesium intake was 322 (+/-132), 323 (+/-163), and 322 (+/-147) mg/day for men, women, and the entire group, respectively. The mean (+/-standard deviation) serum magnesium concentration was 20.69 (+/-2.83), 20.69 (+/-2.88), and 20.69 (+/-2.83) ppm for men, women, and the entire group, respectively. The distribution of serum magnesium concentration was normal. Dietary magnesium intake showed a log-normal distribution, which was then transformed by logarithmic conversion for examining the regression coefficients. The slope of the regression line between the serum magnesium concentration (Y ppm) and daily magnesium intake (X mg) was determined using the formula Y = 4.93 (log(10)X) + 8.49. The coefficient of correlation (r) was 0.29. A regression line (Y = 14.65X + 19.31) was observed between the daily intake of magnesium (Y mg) and serum magnesium concentration (X ppm). The coefficient of correlation was 0.28. The daily magnesium intake correlated with serum magnesium concentration, and a linear regression model between them was proposed.
Fast estimation of diffusion tensors under Rician noise by the EM algorithm.
Liu, Jia; Gasbarra, Dario; Railavo, Juha
2016-01-15
Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a fast computational method for maximum likelihood estimation (MLE) of diffusivities under the Rician noise model based on the expectation maximization (EM) algorithm. By using data augmentation, we are able to transform a non-linear regression problem into the generalized linear modeling framework, reducing dramatically the computational cost. The Fisher-scoring method is used for achieving fast convergence of the tensor parameter. The new method is implemented and applied using both synthetic and real data in a wide range of b-amplitudes up to 14,000s/mm(2). Higher accuracy and precision of the Rician estimates are achieved compared with other log-normal based methods. In addition, we extend the maximum likelihood (ML) framework to the maximum a posteriori (MAP) estimation in DTI under the aforementioned scheme by specifying the priors. We will describe how close numerically are the estimators of model parameters obtained through MLE and MAP estimation. Copyright © 2015 Elsevier B.V. All rights reserved.
Sieve analysis using the number of infecting pathogens.
Follmann, Dean; Huang, Chiung-Yu
2017-12-14
Assessment of vaccine efficacy as a function of the similarity of the infecting pathogen to the vaccine is an important scientific goal. Characterization of pathogen strains for which vaccine efficacy is low can increase understanding of the vaccine's mechanism of action and offer targets for vaccine improvement. Traditional sieve analysis estimates differential vaccine efficacy using a single identifiable pathogen for each subject. The similarity between this single entity and the vaccine immunogen is quantified, for example, by exact match or number of mismatched amino acids. With new technology, we can now obtain the actual count of genetically distinct pathogens that infect an individual. Let F be the number of distinct features of a species of pathogen. We assume a log-linear model for the expected number of infecting pathogens with feature "f," f=1,…,F. The model can be used directly in studies with passive surveillance of infections where the count of each type of pathogen is recorded at the end of some interval, or active surveillance where the time of infection is known. For active surveillance, we additionally assume that a proportional intensity model applies to the time of potentially infectious exposures and derive product and weighted estimating equation (WEE) estimators for the regression parameters in the log-linear model. The WEE estimator explicitly allows for waning vaccine efficacy and time-varying distributions of pathogens. We give conditions where sieve parameters have a per-exposure interpretation under passive surveillance. We evaluate the methods by simulation and analyze a phase III trial of a malaria vaccine. © 2017, The International Biometric Society.
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Integrating models that depend on variable data
NASA Astrophysics Data System (ADS)
Banks, A. T.; Hill, M. C.
2016-12-01
Models of human-Earth systems are often developed with the goal of predicting the behavior of one or more dependent variables from multiple independent variables, processes, and parameters. Often dependent variable values range over many orders of magnitude, which complicates evaluation of the fit of the dependent variable values to observations. Many metrics and optimization methods have been proposed to address dependent variable variability, with little consensus being achieved. In this work, we evaluate two such methods: log transformation (based on the dependent variable being log-normally distributed with a constant variance) and error-based weighting (based on a multi-normal distribution with variances that tend to increase as the dependent variable value increases). Error-based weighting has the advantage of encouraging model users to carefully consider data errors, such as measurement and epistemic errors, while log-transformations can be a black box for typical users. Placing the log-transformation into the statistical perspective of error-based weighting has not formerly been considered, to the best of our knowledge. To make the evaluation as clear and reproducible as possible, we use multiple linear regression (MLR). Simulations are conducted with MatLab. The example represents stream transport of nitrogen with up to eight independent variables. The single dependent variable in our example has values that range over 4 orders of magnitude. Results are applicable to any problem for which individual or multiple data types produce a large range of dependent variable values. For this problem, the log transformation produced good model fit, while some formulations of error-based weighting worked poorly. Results support previous suggestions fthat error-based weighting derived from a constant coefficient of variation overemphasizes low values and degrades model fit to high values. Applying larger weights to the high values is inconsistent with the log-transformation. Greater consistency is obtained by imposing smaller (by up to a factor of 1/35) weights on the smaller dependent-variable values. From an error-based perspective, the small weights are consistent with large standard deviations. This work considers the consequences of these two common ways of addressing variable data.
Rosenbaum, Paula F; Weinstock, Ruth S; Silverstone, Allen E; Sjödin, Andreas; Pavuk, Marian
2017-11-01
The Anniston Community Health Survey, a cross-sectional study, was undertaken in 2005-2007 to study environmental exposure to polychlorinated biphenyl (PCB) and organochlorine (OC) pesticides and health outcomes among residents of Anniston, AL, United States. The examination of potential risks between these pollutants and metabolic syndrome, a cluster of cardiovascular risk factors (i.e., hypertension, central obesity, dyslipidemia and dysglycemia) was the focus of this analysis. Participants were 548 adults who completed the survey and a clinic visit, were free of diabetes, and had a serum sample for clinical laboratory parameters as well as PCB and OC pesticide concentrations. Associations between summed concentrations of 35 PCB congeners and 9 individual pesticides and metabolic syndrome were examined using generalized linear modeling and logistic regression; odds ratios (OR) and 95% confidence intervals (CI) are reported. Pollutants were evaluated as quintiles and as log transformations of continuous serum concentrations. Participants were mostly female (68%) with a mean age (SD) of 53.6 (16.2) years. The racial distribution was 56% white and 44% African American; 49% met the criteria for metabolic syndrome. In unadjusted logistic regression, statistically significant and positive associations across the majority of quintiles were noted for seven individually modeled pesticides (p,p'-DDT, p,p'-DDE, HCB, β-HCCH, oxychlor, tNONA, Mirex). Following adjustment for covariables (i.e., age, sex, race, education, marital status, current smoking, alcohol consumption, positive family history of diabetes or cardiovascular disease, liver disease, BMI), significant elevations in risk were noted for p,p'-DDT across multiple quintiles (range of ORs 1.61 to 2.36), for tNONA (range of ORs 1.62-2.80) and for p,p'-DDE [OR (95% CI)] of 2.73 (1.09-6.88) in the highest quintile relative to the first. Significant trends were observed in adjusted logistic models for log 10 HCB [OR=6.15 (1.66-22.88)], log 10 oxychlor [OR=2.09 (1.07-4.07)] and log 10 tNONA [3.19 (1.45-7.00)]. Summed PCB concentrations were significantly and positively associated with metabolic syndrome only in unadjusted models; adjustment resulted in attenuation of the ORs in both the quintile and log-transformed models. In conclusion, several OC pesticides were found to have significant associations with metabolic syndrome in the Anniston study population while no association was observed for PCBs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rosenbaum, Paula F.; Weinstock, Ruth S.; Silverstone, Allen E.; Sjödin, Andreas; Pavuk, Marian
2017-01-01
The Anniston Community Health Survey, a cross-sectional study, was undertaken in 2005–2007 to study environmental exposure to polychlorinated biphenyl (PCB) and organochlorine (OC) pesticides and health outcomes among residents of Anniston, AL, United States. The examination of potential risks between these pollutants and metabolic syndrome, a cluster of cardiovascular risk factors (i.e., hypertension, central obesity, dyslipidemia and dysglycemia) was the focus of this analysis. Participants were 548 adults who completed the survey and a clinic visit, were free of diabetes, and had a serum sample for clinical laboratory parameters as well as PCB and OC pesticide concentrations. Associations between summed concentrations of 35 PCB congeners and 9 individual pesticides and metabolic syndrome were examined using generalized linear modeling and logistic regression; odds ratios (OR) and 95% confidence intervals (CI) are reported. Pollutants were evaluated as quintiles and as log transformations of continuous serum concentrations. Participants were mostly female (68%) with a mean age (SD) of 53.6 (16.2) years. The racial distribution was 56% white and 44% African American; 49% met the criteria for metabolic syndrome. In unadjusted logistic regression, statistically significant and positive associations across the majority of quintiles were noted for seven individually modeled pesticides (p,p′-DDT, p,p′-DDE, HCB, β-HCCH, oxychlor, tNONA, Mirex). Following adjustment for covariables (i.e., age, sex, race, education, marital status, current smoking, alcohol consumption, positive family history of diabetes or cardiovascular disease, liver disease, BMI), significant elevations in risk were noted for p,p′-DDT across multiple quintiles (range of ORs 1.61 to 2.36), for tNONA (range of ORs 1.62–2.80) and for p,p′-DDE [OR (95% CI)] of 2.73 (1.09–6.88) in the highest quintile relative to the first. Significant trends were observed in adjusted logistic models for log10 HCB [OR = 6.15 (1.66–22.88)], log10 oxychlor [OR = 2.09 (1.07–4.07)] and log10 tNONA [3.19 (1.45–7.00)]. Summed PCB concentrations were significantly and positively associated with metabolic syndrome only in unadjusted models; adjustment resulted in attenuation of the ORs in both the quintile and log-transformed models. In conclusion, several OC pesticides were found to have significant associations with metabolic syndrome in the Anniston study population while no association was observed for PCBs. PMID:28779625
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Log-linear human chorionic gonadotropin elimination in cases of retained placenta percreta.
Stitely, Michael L; Gerard Jackson, M; Holls, William H
2014-02-01
To describe the human chorionic gonadotropin (hCG) elimination rate in patients with intentionally retained placenta percreta. Medical records for cases of placenta percreta with intentional retention of the placenta were reviewed. The natural log of the hCG levels were plotted versus time and then the elimination rate equations were derived. The hCG elimination rate equations were log-linear in three cases individually (R (2) = 0.96-0.99) and in aggregate R (2) = 0.92). The mean half-life of hCG elimination was 146.3 h (6.1 days). The elimination of hCG in patients with intentionally retained placenta percreta is consistent with a two-compartment elimination model. The hCG elimination in retained placenta percreta is predictable in a log-linear manner that is similar to other reports of retained abnormally adherent placentae treated with or without methotrexate.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Supek, Fran; Ramljak, Tatjana Šumanovac; Marjanović, Marko; Buljubašić, Maja; Kragol, Goran; Ilić, Nataša; Smuc, Tomislav; Zahradka, Davor; Mlinarić-Majerski, Kata; Kralj, Marijeta
2011-08-01
18-crown-6 ethers are known to exert their biological activity by transporting K(+) ions across cell membranes. Using non-linear Support Vector Machines regression, we searched for structural features that influence antiproliferative activity in a diverse set of 19 known oxa-, monoaza- and diaza-18-crown-6 ethers. Here, we show that the logP of the molecule is the most important molecular descriptor, among ∼1300 tested descriptors, in determining biological potency (R(2)(cv) = 0.704). The optimal logP was at 5.5 (Ghose-Crippen ALOGP estimate) while both higher and lower values were detrimental to biological potency. After controlling for logP, we found that the antiproliferative activity of the molecule was generally not affected by side chain length, molecular symmetry, or presence of side chain amide links. To validate this QSAR model, we synthesized six novel, highly lipophilic diaza-18-crown-6 derivatives with adamantane moieties attached to the side arms. These compounds have near-optimal logP values and consequently exhibit strong growth inhibition in various human cancer cell lines and a bacterial system. The bioactivities of different diaza-18-crown-6 analogs in Bacillus subtilis and cancer cells were correlated, suggesting conserved molecular features may be mediating the cytotoxic response. We conclude that relying primarily on the logP is a sensible strategy in preparing future 18-crown-6 analogs with optimized biological activity. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Advances in nowcasting influenza-like illness rates using search query logs
NASA Astrophysics Data System (ADS)
Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian
2015-08-01
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Advances in nowcasting influenza-like illness rates using search query logs.
Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian
2015-08-03
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Spatial Bayesian latent factor regression modeling of coordinate-based meta-analysis data.
Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D; Nichols, Thomas E
2018-03-01
Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the article are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to (i) identify areas of consistent activation; and (ii) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterized as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. © 2017, The International Biometric Society.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
ERIC Educational Resources Information Center
Lee, Young-Jin
2015-01-01
This study investigates whether information saved in the log files of a computer-based tutor can be used to predict the problem solving performance of students. The log files of a computer-based physics tutoring environment called Andes Physics Tutor was analyzed to build a logistic regression model that predicted success and failure of students'…
Weighted SGD for ℓ p Regression with Randomized Preconditioning.
Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W
2016-01-01
In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems-e.g., ℓ 2 and ℓ 1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓ p regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓ p solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ 1 regression with size n by d , pwSGD returns an approximate solution with ε relative error in the objective value in (log n ·nnz( A )+poly( d )/ ε 2 ) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ 2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in prediction norm in (log n ·nnz( A )+poly( d ) log(1/ ε )/ ε ) time. We show that for unconstrained ℓ 2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy ε , high dimension n and low dimension d satisfy d ≥ 1/ ε and n ≥ d 2 / ε . We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., ε = 10 -3 , more quickly.
Weighted SGD for ℓp Regression with Randomized Preconditioning*
Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.
2018-01-01
In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems—e.g., ℓ2 and ℓ1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓp regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓp solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ1 regression with size n by d, pwSGD returns an approximate solution with ε relative error in the objective value in 𝒪(log n·nnz(A)+poly(d)/ε2) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in prediction norm in 𝒪(log n·nnz(A)+poly(d) log(1/ε)/ε) time. We show that for unconstrained ℓ2 regression, this complexity is comparable to that of RLA and is asymptotically better over several state-of-the-art solvers in the regime where the desired accuracy ε, high dimension n and low dimension d satisfy d ≥ 1/ε and n ≥ d2/ε. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that pwSGD converges to a medium-precision solution, e.g., ε = 10−3, more quickly. PMID:29782626
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.
Smyth, Gordon K; Altman, Naomi S
2013-05-26
Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08
Casey, Martin F.; Neidell, Matthew
2013-01-01
Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
Fujisawa, Seiichiro; Kadoma, Yoshinori
2012-01-01
We investigated the quantitative structure-activity relationships between hemolytic activity (log 1/H(50)) or in vivo mouse intraperitoneal (ip) LD(50) using reported data for α,β-unsaturated carbonyl compounds such as (meth)acrylate monomers and their (13)C-NMR β-carbon chemical shift (δ). The log 1/H(50) value for methacrylates was linearly correlated with the δC(β) value. That for (meth)acrylates was linearly correlated with log P, an index of lipophilicity. The ipLD(50) for (meth)acrylates was linearly correlated with δC(β) but not with log P. For (meth)acrylates, the δC(β) value, which is dependent on the π-electron density on the β-carbon, was linearly correlated with PM3-based theoretical parameters (chemical hardness, η; electronegativity, χ; electrophilicity, ω), whereas log P was linearly correlated with heat of formation (HF). Also, the interaction between (meth)acrylates and DPPC liposomes in cell membrane molecular models was investigated using (1)H-NMR spectroscopy and differential scanning calorimetry (DSC). The log 1/H(50) value was related to the difference in chemical shift (ΔδHa) (Ha: H (trans) attached to the β-carbon) between the free monomer and the DPPC liposome-bound monomer. Monomer-induced DSC phase transition properties were related to HF for monomers. NMR chemical shifts may represent a valuable parameter for investigating the biological mechanisms of action of (meth)acrylates.
Fujisawa, Seiichiro; Kadoma, Yoshinori
2012-01-01
We investigated the quantitative structure-activity relationships between hemolytic activity (log 1/H50) or in vivo mouse intraperitoneal (ip) LD50 using reported data for α,β-unsaturated carbonyl compounds such as (meth)acrylate monomers and their 13C-NMR β-carbon chemical shift (δ). The log 1/H50 value for methacrylates was linearly correlated with the δCβ value. That for (meth)acrylates was linearly correlated with log P, an index of lipophilicity. The ipLD50 for (meth)acrylates was linearly correlated with δCβ but not with log P. For (meth)acrylates, the δCβ value, which is dependent on the π-electron density on the β-carbon, was linearly correlated with PM3-based theoretical parameters (chemical hardness, η; electronegativity, χ; electrophilicity, ω), whereas log P was linearly correlated with heat of formation (HF). Also, the interaction between (meth)acrylates and DPPC liposomes in cell membrane molecular models was investigated using 1H-NMR spectroscopy and differential scanning calorimetry (DSC). The log 1/H50 value was related to the difference in chemical shift (ΔδHa) (Ha: H (trans) attached to the β-carbon) between the free monomer and the DPPC liposome-bound monomer. Monomer-induced DSC phase transition properties were related to HF for monomers. NMR chemical shifts may represent a valuable parameter for investigating the biological mechanisms of action of (meth)acrylates. PMID:22312284
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.
Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
Friesen, Melissa C; Demers, Paul A; Spinelli, John J; Lorenzi, Maria F; Le, Nhu D
2007-04-01
The association between coal tar-derived substances, a complex mixture of polycyclic aromatic hydrocarbons, and cancer is well established. However, the specific aetiological agents are unknown. To compare the dose-response relationships for two common measures of coal tar-derived substances, benzene-soluble material (BSM) and benzo(a)pyrene (BaP), and to evaluate which among these is more strongly related to the health outcomes. The study population consisted of 6423 men with > or =3 years of work experience at an aluminium smelter (1954-97). Three health outcomes identified from national mortality and cancer databases were evaluated: incidence of bladder cancer (n = 90), incidence of lung cancer (n = 147) and mortality due to acute myocardial infarction (AMI, n = 184). The shape, magnitude and precision of the dose-response relationships and cumulative exposure levels for BSM and BaP were evaluated. Two model structures were assessed, where 1n(relative risk) increased with cumulative exposure (log-linear model) or with log-transformed cumulative exposure (log-log model). The BaP and BSM cumulative exposure metrics were highly correlated (r = 0.94). The increase in model precision using BaP over BSM was 14% for bladder cancer and 5% for lung cancer; no difference was observed for AMI. The log-linear BaP model provided the best fit for bladder cancer. The log-log dose-response models, where risk of disease plateaus at high exposure levels, were the best-fitting models for lung cancer and AMI. BaP and BSM were both strongly associated with bladder and lung cancer and modestly associated with AMI. Similar conclusions regarding the associations could be made regardless of the exposure metric.
Fatigue Shifts and Scatters Heart Rate Variability in Elite Endurance Athletes
Schmitt, Laurent; Regnard, Jacques; Desmarets, Maxime; Mauny, Fréderic; Mourot, Laurent; Fouillot, Jean-Pierre; Coulmy, Nicolas; Millet, Grégoire
2013-01-01
Purpose This longitudinal study aimed at comparing heart rate variability (HRV) in elite athletes identified either in ‘fatigue’ or in ‘no-fatigue’ state in ‘real life’ conditions. Methods 57 elite Nordic-skiers were surveyed over 4 years. R-R intervals were recorded supine (SU) and standing (ST). A fatigue state was quoted with a validated questionnaire. A multilevel linear regression model was used to analyze relationships between heart rate (HR) and HRV descriptors [total spectral power (TP), power in low (LF) and high frequency (HF) ranges expressed in ms2 and normalized units (nu)] and the status without and with fatigue. The variables not distributed normally were transformed by taking their common logarithm (log10). Results 172 trials were identified as in a ‘fatigue’ and 891 as in ‘no-fatigue’ state. All supine HR and HRV parameters (Beta±SE) were significantly different (P<0.0001) between ‘fatigue’ and ‘no-fatigue’: HRSU (+6.27±0.61 bpm), logTPSU (−0.36±0.04), logLFSU (−0.27±0.04), logHFSU (−0.46±0.05), logLF/HFSU (+0.19±0.03), HFSU(nu) (−9.55±1.33). Differences were also significant (P<0.0001) in standing: HRST (+8.83±0.89), logTPST (−0.28±0.03), logLFST (−0.29±0.03), logHFST (−0.32±0.04). Also, intra-individual variance of HRV parameters was larger (P<0.05) in the ‘fatigue’ state (logTPSU: 0.26 vs. 0.07, logLFSU: 0.28 vs. 0.11, logHFSU: 0.32 vs. 0.08, logTPST: 0.13 vs. 0.07, logLFST: 0.16 vs. 0.07, logHFST: 0.25 vs. 0.14). Conclusion HRV was significantly lower in 'fatigue' vs. 'no-fatigue' but accompanied with larger intra-individual variance of HRV parameters in 'fatigue'. The broader intra-individual variance of HRV parameters might encompass different changes from no-fatigue state, possibly reflecting different fatigue-induced alterations of HRV pattern. PMID:23951198
Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J
2016-05-01
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.
Aldosterone and glomerular filtration--observations in the general population.
Hannemann, Anke; Rettig, Rainer; Dittmann, Kathleen; Völzke, Henry; Endlich, Karlhans; Nauck, Matthias; Wallaschofski, Henri
2014-03-10
Increasing evidence suggests that aldosterone promotes renal damage. Since data on the association between aldosterone and renal function in the general population are sparse, we chose to address this issue. We investigated the associations between the plasma aldosterone concentration (PAC) or the aldosterone-to-renin ratio (ARR) and the estimated glomerular filtration rate (eGFR) in a sample of adult men and women from Northeast Germany. A study population of 1921 adult men and women who participated in the first follow-up of the Study of Health in Pomerania was selected. None of the subjects used drugs that alter PAC or ARR. The eGFR was calculated according to the four-variable Modification of Diet in Renal Disease formula. Chronic kidney disease (CKD) was defined as an eGFR < 60 ml/min/1.73 m2. Linear regression models, adjusted for sex, age, waist circumference, diabetes mellitus, smoking status, systolic and diastolic blood pressures, serum triglyceride concentrations and time of blood sampling revealed inverse associations of PAC or ARR with eGFR (ß-coefficient for log-transformed PAC -3.12, p < 0.001; ß-coefficient for log-transformed ARR -3.36, p < 0.001). Logistic regression models revealed increased odds for CKD with increasing PAC (odds ratio for a one standard deviation increase in PAC: 1.35, 95% confidence interval: 1.06-1.71). There was no statistically significant association between ARR and CKD. Our study demonstrates that PAC and ARR are inversely associated with the glomerular filtration rate in the general population.
Blakely, William F; Bolduc, David L; Debad, Jeff; Sigal, George; Port, Matthias; Abend, Michael; Valente, Marco; Drouet, Michel; Hérodin, Francis
2018-07-01
Use of plasma proteomic and hematological biomarkers represents a promising approach to provide useful diagnostic information for assessment of the severity of hematopoietic acute radiation syndrome. Eighteen baboons were evaluated in a radiation model that underwent total-body and partial-body irradiations at doses of Co gamma rays from 2.5 to 15 Gy at dose rates of 6.25 cGy min and 32 cGy min. Hematopoietic acute radiation syndrome severity levels determined by an analysis of blood count changes measured up to 60 d after irradiation were used to gauge overall hematopoietic acute radiation syndrome severity classifications. A panel of protein biomarkers was measured on plasma samples collected at 0 to 28 d after exposure using electrochemiluminescence-detection technology. The database was split into two distinct groups (i.e., "calibration," n = 11; "validation," n = 7). The calibration database was used in an initial stepwise regression multivariate model-fitting approach followed by down selection of biomarkers for identification of subpanels of hematopoietic acute radiation syndrome-responsive biomarkers for three time windows (i.e., 0-2 d, 2-7 d, 7-28 d). Model 1 (0-2 d) includes log C-reactive protein (p < 0.0001), log interleukin-13 (p < 0.0054), and procalcitonin (p < 0.0316) biomarkers; model 2 (2-7 d) includes log CD27 (p < 0.0001), log FMS-related tyrosine kinase 3 ligand (p < 0.0001), log serum amyloid A (p < 0.0007), and log interleukin-6 (p < 0.0002); and model 3 (7-28 d) includes log CD27 (p < 0.0012), log serum amyloid A (p < 0.0002), log erythropoietin (p < 0.0001), and log CD177 (p < 0.0001). The predicted risk of radiation injury categorization values, representing the hematopoietic acute radiation syndrome severity outcome for the three models, produced least squares multiple regression fit confidences of R = 0.73, 0.82, and 0.75, respectively. The resultant algorithms support the proof of concept that plasma proteomic biomarkers can supplement clinical signs and symptoms to assess hematopoietic acute radiation syndrome risk severity.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Liang, Chao; Han, Shu-ying; Qiao, Jun-qin; Lian, Hong-zhen; Ge, Xin
2014-11-01
A strategy to utilize neutral model compounds for lipophilicity measurement of ionizable basic compounds by reversed-phase high-performance liquid chromatography is proposed in this paper. The applicability of the novel protocol was justified by theoretical derivation. Meanwhile, the linear relationships between logarithm of apparent n-octanol/water partition coefficients (logKow '') and logarithm of retention factors corresponding to the 100% aqueous fraction of mobile phase (logkw ) were established for a basic training set, a neutral training set and a mixed training set of these two. As proved in theory, the good linearity and external validation results indicated that the logKow ''-logkw relationships obtained from a neutral model training set were always reliable regardless of mobile phase pH. Afterwards, the above relationships were adopted to determine the logKow of harmaline, a weakly dissociable alkaloid. As far as we know, this is the first report on experimental logKow data for harmaline (logKow = 2.28 ± 0.08). Introducing neutral compounds into a basic model training set or using neutral model compounds alone is recommended to measure the lipophilicity of weakly ionizable basic compounds especially those with high hydrophobicity for the advantages of more suitable model compound choices and convenient mobile phase pH control. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Log-linear model based behavior selection method for artificial fish swarm algorithm.
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm.
Reimus, Paul W; Callahan, Timothy J; Ware, S Doug; Haga, Marc J; Counce, Dale A
2007-08-15
Diffusion cell experiments were conducted to measure nonsorbing solute matrix diffusion coefficients in forty-seven different volcanic rock matrix samples from eight different locations (with multiple depth intervals represented at several locations) at the Nevada Test Site. The solutes used in the experiments included bromide, iodide, pentafluorobenzoate (PFBA), and tritiated water ((3)HHO). The porosity and saturated permeability of most of the diffusion cell samples were measured to evaluate the correlation of these two variables with tracer matrix diffusion coefficients divided by the free-water diffusion coefficient (D(m)/D*). To investigate the influence of fracture coating minerals on matrix diffusion, ten of the diffusion cells represented paired samples from the same depth interval in which one sample contained a fracture surface with mineral coatings and the other sample consisted of only pure matrix. The log of (D(m)/D*) was found to be positively correlated with both the matrix porosity and the log of matrix permeability. A multiple linear regression analysis indicated that both parameters contributed significantly to the regression at the 95% confidence level. However, the log of the matrix diffusion coefficient was more highly-correlated with the log of matrix permeability than with matrix porosity, which suggests that matrix diffusion coefficients, like matrix permeabilities, have a greater dependence on the interconnectedness of matrix porosity than on the matrix porosity itself. The regression equation for the volcanic rocks was found to provide satisfactory predictions of log(D(m)/D*) for other types of rocks with similar ranges of matrix porosity and permeability as the volcanic rocks, but it did a poorer job predicting log(D(m)/D*) for rocks with lower porosities and/or permeabilities. The presence of mineral coatings on fracture walls did not appear to have a significant effect on matrix diffusion in the ten paired diffusion cell experiments.
NASA Astrophysics Data System (ADS)
Reimus, Paul W.; Callahan, Timothy J.; Ware, S. Doug; Haga, Marc J.; Counce, Dale A.
2007-08-01
Diffusion cell experiments were conducted to measure nonsorbing solute matrix diffusion coefficients in forty-seven different volcanic rock matrix samples from eight different locations (with multiple depth intervals represented at several locations) at the Nevada Test Site. The solutes used in the experiments included bromide, iodide, pentafluorobenzoate (PFBA), and tritiated water ( 3HHO). The porosity and saturated permeability of most of the diffusion cell samples were measured to evaluate the correlation of these two variables with tracer matrix diffusion coefficients divided by the free-water diffusion coefficient ( Dm/ D*). To investigate the influence of fracture coating minerals on matrix diffusion, ten of the diffusion cells represented paired samples from the same depth interval in which one sample contained a fracture surface with mineral coatings and the other sample consisted of only pure matrix. The log of ( Dm/ D*) was found to be positively correlated with both the matrix porosity and the log of matrix permeability. A multiple linear regression analysis indicated that both parameters contributed significantly to the regression at the 95% confidence level. However, the log of the matrix diffusion coefficient was more highly-correlated with the log of matrix permeability than with matrix porosity, which suggests that matrix diffusion coefficients, like matrix permeabilities, have a greater dependence on the interconnectedness of matrix porosity than on the matrix porosity itself. The regression equation for the volcanic rocks was found to provide satisfactory predictions of log( Dm/ D*) for other types of rocks with similar ranges of matrix porosity and permeability as the volcanic rocks, but it did a poorer job predicting log( Dm/ D*) for rocks with lower porosities and/or permeabilities. The presence of mineral coatings on fracture walls did not appear to have a significant effect on matrix diffusion in the ten paired diffusion cell experiments.
Gupta, Deepak K; Claggett, Brian; Wells, Quinn; Cheng, Susan; Li, Man; Maruthur, Nisa; Selvin, Elizabeth; Coresh, Josef; Konety, Suma; Butler, Kenneth R; Mosley, Thomas; Boerwinkle, Eric; Hoogeveen, Ron; Ballantyne, Christie M; Solomon, Scott D
2015-01-01
Background Natriuretic peptides promote natriuresis, diuresis, and vasodilation. Experimental deficiency of natriuretic peptides leads to hypertension (HTN) and cardiac hypertrophy, conditions more common among African Americans. Hospital-based studies suggest that African Americans may have reduced circulating natriuretic peptides, as compared to Caucasians, but definitive data from community-based cohorts are lacking. Methods and Results We examined plasma N-terminal pro B-type natriuretic peptide (NTproBNP) levels according to race in 9137 Atherosclerosis Risk in Communities (ARIC) Study participants (22% African American) without prevalent cardiovascular disease at visit 4 (1996–1998). Multivariable linear and logistic regression analyses were performed adjusting for clinical covariates. Among African Americans, percent European ancestry was determined from genetic ancestry informative markers and then examined in relation to NTproBNP levels in multivariable linear regression analysis. NTproBNP levels were significantly lower in African Americans (median, 43 pg/mL; interquartile range [IQR], 18, 88) than Caucasians (median, 68 pg/mL; IQR, 36, 124; P<0.0001). In multivariable models, adjusted log NTproBNP levels were 40% lower (95% confidence interval [CI], −43, −36) in African Americans, compared to Caucasians, which was consistent across subgroups of age, gender, HTN, diabetes, insulin resistance, and obesity. African-American race was also significantly associated with having nondetectable NTproBNP (adjusted OR, 5.74; 95% CI, 4.22, 7.80). In multivariable analyses in African Americans, a 10% increase in genetic European ancestry was associated with a 7% (95% CI, 1, 13) increase in adjusted log NTproBNP. Conclusions African Americans have lower levels of plasma NTproBNP than Caucasians, which may be partially owing to genetic variation. Low natriuretic peptide levels in African Americans may contribute to the greater risk for HTN and its sequalae in this population. PMID:25999400
Ku, Po-Wen; Steptoe, Andrew; Liao, Yung; Hsueh, Ming-Chun; Chen, Li-Jung
2018-05-25
The appropriate limit to the amount of daily sedentary time (ST) required to minimize mortality is uncertain. This meta-analysis aimed to quantify the dose-response association between daily ST and all-cause mortality and to explore the cut-off point above which health is impaired in adults aged 18-64 years old. We also examined whether there are differences between studies using self-report ST and those with device-based ST. Prospective cohort studies providing effect estimates of daily ST (exposure) on all-cause mortality (outcome) were identified via MEDLINE, PubMed, Scopus, Web of Science, and Google Scholar databases until January 2018. Dose-response relationships between daily ST and all-cause mortality were examined using random-effects meta-regression models. Based on the pooled data for more than 1 million participants from 19 studies, the results showed a log-linear dose-response association between daily ST and all-cause mortality. Overall, more time spent in sedentary behaviors is associated with increased mortality risks. However, the method of measuring ST moderated the association between daily ST and mortality risk (p < 0.05). The cut-off of daily ST in studies with self-report ST was 7 h/day in comparison with 9 h/day for those with device-based ST. Higher amounts of daily ST are log-linearly associated with increased risk of all-cause mortality in adults. On the basis of a limited number of studies using device-based measures, the findings suggest that it may be appropriate to encourage adults to engage in less sedentary behaviors, with fewer than 9 h a day being relevant for all-cause mortality.
NASA Astrophysics Data System (ADS)
Cranganu, Constantin
2007-10-01
Many sedimentary basins throughout the world exhibit areas with abnormal pore-fluid pressures (higher or lower than normal or hydrostatic pressure). Predicting pore pressure and other parameters (depth, extension, magnitude, etc.) in such areas are challenging tasks. The compressional acoustic (sonic) log (DT) is often used as a predictor because it responds to changes in porosity or compaction produced by abnormal pore-fluid pressures. Unfortunately, the sonic log is not commonly recorded in most oil and/or gas wells. We propose using an artificial neural network to synthesize sonic logs by identifying the mathematical dependency between DT and the commonly available logs, such as normalized gamma ray (GR) and deep resistivity logs (REID). The artificial neural network process can be divided into three steps: (1) Supervised training of the neural network; (2) confirmation and validation of the model by blind-testing the results in wells that contain both the predictor (GR, REID) and the target values (DT) used in the supervised training; and 3) applying the predictive model to all wells containing the required predictor data and verifying the accuracy of the synthetic DT data by comparing the back-predicted synthetic predictor curves (GRNN, REIDNN) to the recorded predictor curves used in training (GR, REID). Artificial neural networks offer significant advantages over traditional deterministic methods. They do not require a precise mathematical model equation that describes the dependency between the predictor values and the target values and, unlike linear regression techniques, neural network methods do not overpredict mean values and thereby preserve original data variability. One of their most important advantages is that their predictions can be validated and confirmed through back-prediction of the input data. This procedure was applied to predict the presence of overpressured zones in the Anadarko Basin, Oklahoma. The results are promising and encouraging.
NASA Technical Reports Server (NTRS)
Bigger, J. T. Jr; Steinman, R. C.; Rolnitzky, L. M.; Fleiss, J. L.; Albrecht, P.; Cohen, R. J.
1996-01-01
BACKGROUND. The purposes of the present study were (1) to establish normal values for the regression of log(power) on log(frequency) for, RR-interval fluctuations in healthy middle-aged persons, (2) to determine the effects of myocardial infarction on the regression of log(power) on log(frequency), (3) to determine the effect of cardiac denervation on the regression of log(power) on log(frequency), and (4) to assess the ability of power law regression parameters to predict death after myocardial infarction. METHODS AND RESULTS. We studied three groups: (1) 715 patients with recent myocardial infarction; (2) 274 healthy persons age and sex matched to the infarct sample; and (3) 19 patients with heart transplants. Twenty-four-hour RR-interval power spectra were computed using fast Fourier transforms and log(power) was regressed on log(frequency) between 10(-4) and 10(-2) Hz. There was a power law relation between log(power) and log(frequency). That is, the function described a descending straight line that had a slope of approximately -1 in healthy subjects. For the myocardial infarction group, the regression line for log(power) on log(frequency) was shifted downward and had a steeper negative slope (-1.15). The transplant (denervated) group showed a larger downward shift in the regression line and a much steeper negative slope (-2.08). The correlation between traditional power spectral bands and slope was weak, and that with log(power) at 10(-4) Hz was only moderate. Slope and log(power) at 10(-4) Hz were used to predict mortality and were compared with the predictive value of traditional power spectral bands. Slope and log(power) at 10(-4) Hz were excellent predictors of all-cause mortality or arrhythmic death. To optimize the prediction of death, we calculated a log(power) intercept that was uncorrelated with the slope of the power law regression line. We found that the combination of slope and zero-correlation log(power) was an outstanding predictor, with a relative risk of > 10, and was better than any combination of the traditional power spectral bands. The combination of slope and log(power) at 10(-4) Hz also was an excellent predictor of death after myocardial infarction. CONCLUSIONS. Myocardial infarction or denervation of the heart causes a steeper slope and decreased height of the power law regression relation between log(power) and log(frequency) of RR-interval fluctuations. Individually and, especially, combined, the power law regression parameters are excellent predictors of death of any cause or arrhythmic death and predict these outcomes better than the traditional power spectral bands.
A log-linear model approach to estimation of population size using the line-transect sampling method
Anderson, D.R.; Burnham, K.P.; Crain, B.R.
1978-01-01
The technique of estimating wildlife population size and density using the belt or line-transect sampling method has been used in many past projects, such as the estimation of density of waterfowl nestling sites in marshes, and is being used currently in such areas as the assessment of Pacific porpoise stocks in regions of tuna fishing activity. A mathematical framework for line-transect methodology has only emerged in the last 5 yr. In the present article, we extend this mathematical framework to a line-transect estimator based upon a log-linear model approach.
Modeling groundwater nitrate concentrations in private wells in Iowa
Wheeler, David C.; Nolan, Bernard T.; Flory, Abigail R.; DellaValle, Curt T.; Ward, Mary H.
2015-01-01
Contamination of drinking water by nitrate is a growing problem in many agricultural areas of the country. Ingested nitrate can lead to the endogenous formation of N-nitroso compounds, potent carcinogens. We developed a predictive model for nitrate concentrations in private wells in Iowa. Using 34,084 measurements of nitrate in private wells, we trained and tested random forest models to predict log nitrate levels by systematically assessing the predictive performance of 179 variables in 36 thematic groups (well depth, distance to sinkholes, location, land use, soil characteristics, nitrogen inputs, meteorology, and other factors). The final model contained 66 variables in 17 groups. Some of the most important variables were well depth, slope length within 1 km of the well, year of sample, and distance to nearest animal feeding operation. The correlation between observed and estimated nitrate concentrations was excellent in the training set (r-square = 0.77) and was acceptable in the testing set (r-square = 0.38). The random forest model had substantially better predictive performance than a traditional linear regression model or a regression tree. Our model will be used to investigate the association between nitrate levels in drinking water and cancer risk in the Iowa participants of the Agricultural Health Study cohort.
Modeling groundwater nitrate concentrations in private wells in Iowa.
Wheeler, David C; Nolan, Bernard T; Flory, Abigail R; DellaValle, Curt T; Ward, Mary H
2015-12-01
Contamination of drinking water by nitrate is a growing problem in many agricultural areas of the country. Ingested nitrate can lead to the endogenous formation of N-nitroso compounds, potent carcinogens. We developed a predictive model for nitrate concentrations in private wells in Iowa. Using 34,084 measurements of nitrate in private wells, we trained and tested random forest models to predict log nitrate levels by systematically assessing the predictive performance of 179 variables in 36 thematic groups (well depth, distance to sinkholes, location, land use, soil characteristics, nitrogen inputs, meteorology, and other factors). The final model contained 66 variables in 17 groups. Some of the most important variables were well depth, slope length within 1 km of the well, year of sample, and distance to nearest animal feeding operation. The correlation between observed and estimated nitrate concentrations was excellent in the training set (r-square=0.77) and was acceptable in the testing set (r-square=0.38). The random forest model had substantially better predictive performance than a traditional linear regression model or a regression tree. Our model will be used to investigate the association between nitrate levels in drinking water and cancer risk in the Iowa participants of the Agricultural Health Study cohort. Copyright © 2015 Elsevier B.V. All rights reserved.
Daily Magnesium Intake and Serum Magnesium Concentration among Japanese People
Akizawa, Yoriko; Koizumi, Sadayuki; Itokawa, Yoshinori; Ojima, Toshiyuki; Nakamura, Yosikazu; Tamura, Tarou; Kusaka, Yukinori
2008-01-01
Background The vitamins and minerals that are deficient in the daily diet of a normal adult remain unknown. To answer this question, we conducted a population survey focusing on the relationship between dietary magnesium intake and serum magnesium level. Methods The subjects were 62 individuals from Fukui Prefecture who participated in the 1998 National Nutrition Survey. The survey investigated the physical status, nutritional status, and dietary data of the subjects. Holidays and special occasions were avoided, and a day when people are most likely to be on an ordinary diet was selected as the survey date. Results The mean (±standard deviation) daily magnesium intake was 322 (±132), 323 (±163), and 322 (±147) mg/day for men, women, and the entire group, respectively. The mean (±standard deviation) serum magnesium concentration was 20.69 (±2.83), 20.69 (±2.88), and 20.69 (±2.83) ppm for men, women, and the entire group, respectively. The distribution of serum magnesium concentration was normal. Dietary magnesium intake showed a log-normal distribution, which was then transformed by logarithmic conversion for examining the regression coefficients. The slope of the regression line between the serum magnesium concentration (Y ppm) and daily magnesium intake (X mg) was determined using the formula Y = 4.93 (log10X) + 8.49. The coefficient of correlation (r) was 0.29. A regression line (Y = 14.65X + 19.31) was observed between the daily intake of magnesium (Y mg) and serum magnesium concentration (X ppm). The coefficient of correlation was 0.28. Conclusion The daily magnesium intake correlated with serum magnesium concentration, and a linear regression model between them was proposed. PMID:18635902
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Estimation of Compaction Parameters Based on Soil Classification
NASA Astrophysics Data System (ADS)
Lubis, A. S.; Muis, Z. A.; Hastuty, I. P.; Siregar, I. M.
2018-02-01
Factors that must be considered in compaction of the soil works were the type of soil material, field control, maintenance and availability of funds. Those problems then raised the idea of how to estimate the density of the soil with a proper implementation system, fast, and economical. This study aims to estimate the compaction parameter i.e. the maximum dry unit weight (γ dmax) and optimum water content (Wopt) based on soil classification. Each of 30 samples were being tested for its properties index and compaction test. All of the data’s from the laboratory test results, were used to estimate the compaction parameter values by using linear regression and Goswami Model. From the research result, the soil types were A4, A-6, and A-7 according to AASHTO and SC, SC-SM, and CL based on USCS. By linear regression, the equation for estimation of the maximum dry unit weight (γdmax *)=1,862-0,005*FINES- 0,003*LL and estimation of the optimum water content (wopt *)=- 0,607+0,362*FINES+0,161*LL. By Goswami Model (with equation Y=mLogG+k), for estimation of the maximum dry unit weight (γdmax *) with m=-0,376 and k=2,482, for estimation of the optimum water content (wopt *) with m=21,265 and k=-32,421. For both of these equations a 95% confidence interval was obtained.
Development of a pharmacogenetic-guided warfarin dosing algorithm for Puerto Rican patients.
Ramos, Alga S; Seip, Richard L; Rivera-Miranda, Giselle; Felici-Giovanini, Marcos E; Garcia-Berdecia, Rafael; Alejandro-Cowan, Yirelia; Kocherla, Mohan; Cruz, Iadelisse; Feliu, Juan F; Cadilla, Carmen L; Renta, Jessica Y; Gorowski, Krystyna; Vergara, Cunegundo; Ruaño, Gualberto; Duconge, Jorge
2012-12-01
This study was aimed at developing a pharmacogenetic-driven warfarin-dosing algorithm in 163 admixed Puerto Rican patients on stable warfarin therapy. A multiple linear-regression analysis was performed using log-transformed effective warfarin dose as the dependent variable, and combining CYP2C9 and VKORC1 genotyping with other relevant nongenetic clinical and demographic factors as independent predictors. The model explained more than two-thirds of the observed variance in the warfarin dose among Puerto Ricans, and also produced significantly better 'ideal dose' estimates than two pharmacogenetic models and clinical algorithms published previously, with the greatest benefit seen in patients ultimately requiring <7 mg/day. We also assessed the clinical validity of the model using an independent validation cohort of 55 Puerto Rican patients from Hartford, CT, USA (R(2) = 51%). Our findings provide the basis for planning prospective pharmacogenetic studies to demonstrate the clinical utility of genotyping warfarin-treated Puerto Rican patients.
On comparison of net survival curves.
Pavlič, Klemen; Perme, Maja Pohar
2017-05-02
Relative survival analysis is a subfield of survival analysis where competing risks data are observed, but the causes of death are unknown. A first step in the analysis of such data is usually the estimation of a net survival curve, possibly followed by regression modelling. Recently, a log-rank type test for comparison of net survival curves has been introduced and the goal of this paper is to explore its properties and put this methodological advance into the context of the field. We build on the association between the log-rank test and the univariate or stratified Cox model and show the analogy in the relative survival setting. We study the properties of the methods using both the theoretical arguments as well as simulations. We provide an R function to enable practical usage of the log-rank type test. Both the log-rank type test and its model alternatives perform satisfactory under the null, even if the correlation between their p-values is rather low, implying that both approaches cannot be used simultaneously. The stratified version has a higher power in case of non-homogeneous hazards, but also carries a different interpretation. The log-rank type test and its stratified version can be interpreted in the same way as the results of an analogous semi-parametric additive regression model despite the fact that no direct theoretical link can be established between the test statistics.
Economic policy optimization based on both one stochastic model and the parametric control theory
NASA Astrophysics Data System (ADS)
Ashimov, Abdykappar; Borovskiy, Yuriy; Onalbekov, Mukhit
2016-06-01
A nonlinear dynamic stochastic general equilibrium model with financial frictions is developed to describe two interacting national economies in the environment of the rest of the world. Parameters of nonlinear model are estimated based on its log-linearization by the Bayesian approach. The nonlinear model is verified by retroprognosis, estimation of stability indicators of mappings specified by the model, and estimation the degree of coincidence for results of internal and external shocks' effects on macroeconomic indicators on the basis of the estimated nonlinear model and its log-linearization. On the base of the nonlinear model, the parametric control problems of economic growth and volatility of macroeconomic indicators of Kazakhstan are formulated and solved for two exchange rate regimes (free floating and managed floating exchange rates)
Nualkaekul, Sawaminee; Salmeron, Ivan; Charalampopoulos, Dimitris
2011-12-01
The survival of Bifidobacterium longum NCIMB 8809 was studied during refrigerated storage for 6weeks in model solutions, based on which a mathematical model was constructed describing cell survival as a function of pH, citric acid, protein and dietary fibre. A Central Composite Design (CCD) was developed studying the influence of four factors at three levels, i.e., pH (3.2-4), citric acid (2-15g/l), protein (0-10g/l), and dietary fibre (0-8g/l). In total, 31 experimental runs were carried out. Analysis of variance (ANOVA) of the regression model demonstrated that the model fitted well the data. From the regression coefficients it was deduced that all four factors had a statistically significant (P<0.05) negative effect on the log decrease [log10N0 week-log10N6 week], with the pH and citric acid being the most influential ones. Cell survival during storage was also investigated in various types of juices, including orange, grapefruit, blackcurrant, pineapple, pomegranate and strawberry. The highest cell survival (less than 0.4log decrease) after 6weeks of storage was observed in orange and pineapple, both of which had a pH of about 3.8. Although the pH of grapefruit and blackcurrant was similar (pH ∼3.2), the log decrease of the former was ∼0.5log, whereas of the latter was ∼0.7log. One reason for this could be the fact that grapefruit contained a high amount of citric acid (15.3g/l). The log decrease in pomegranate and strawberry juices was extremely high (∼8logs). The mathematical model was able to predict adequately the cell survival in orange, grapefruit, blackcurrant, and pineapple juices. However, the model failed to predict the cell survival in pomegranate and strawberry, most likely due to the very high levels of phenolic compounds in these two juices. Copyright © 2011 Elsevier Ltd. All rights reserved.
Andersen, Claus E; Raaschou-Nielsen, Ole; Andersen, Helle Primdal; Lind, Morten; Gravesen, Peter; Thomsen, Birthe L; Ulbak, Kaare
2007-01-01
A linear regression model has been developed for the prediction of indoor (222)Rn in Danish houses. The model provides proxy radon concentrations for about 21,000 houses in a Danish case-control study on the possible association between residential radon and childhood cancer (primarily leukaemia). The model was calibrated against radon measurements in 3116 houses. An independent dataset with 788 house measurements was used for model performance assessment. The model includes nine explanatory variables, of which the most important ones are house type and geology. All explanatory variables are available from central databases. The model was fitted to log-transformed radon concentrations and it has an R(2) of 40%. The uncertainty associated with individual predictions of (untransformed) radon concentrations is about a factor of 2.0 (one standard deviation). The comparison with the independent test data shows that the model makes sound predictions and that errors of radon predictions are only weakly correlated with the estimates themselves (R(2) = 10%).
Technology diffusion in hospitals: a log odds random effects regression model.
Blank, Jos L T; Valdmanis, Vivian G
2015-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the diffusion of technologies. We distinguish a number of determinants, such as service, physician, and environmental, financial and organizational characteristics of the 60 Dutch hospitals in our sample. On the basis of this data set on Dutch general hospitals over the period 1995-2002, we conclude that there is a relation between a number of determinants and the diffusion of innovations underlining conclusions from earlier research. Positive effects were found on the basis of the size of the hospitals, competition and a hospital's commitment to innovation. It appears that if a policy is developed to further diffuse innovations, the external effects of demand and market competition need to be examined, which would de facto lead to an efficient use of technology. For the individual hospital, instituting an innovations office appears to be the most prudent course of action. © 2013 The Authors. International Journal of Health Planning and Management published by John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Individual and Group-Based Engagement in an Online Physical Activity Monitoring Program in Georgia.
Smith, Matthew Lee; Durrett, Nicholas K; Bowie, Maria; Berg, Alison; McCullick, Bryan A; LoPilato, Alexander C; Murray, Deborah
2018-06-07
Given the rising prevalence of obesity in the United States, innovative methods are needed to increase physical activity (PA) in community settings. Evidence suggests that individuals are more likely to engage in PA if they are given a choice of activities and have support from others (for encouragement, motivation, and accountability). The objective of this study was to describe the use of the online Walk Georgia PA tracking platform according to whether the user was an individual user or group user. Walk Georgia is a free, interactive online tracking platform that enables users to log PA by duration, activity, and perceived difficulty, and then converts these data into points based on metabolic equivalents. Users join individually or in groups and are encouraged to set weekly PA goals. Data were examined for 6,639 users (65.8% were group users) over 28 months. We used independent sample t tests and Mann-Whitney U tests to compare means between individual and group users. Two linear regression models were fitted to identify factors associated with activity logging. Users logged 218,766 activities (15,119,249 minutes of PA spanning 592,714 miles [41,858,446 points]). On average, group users had created accounts more recently than individual users (P < .001); however, group users logged more activities (P < .001). On average, group users logged more minutes of PA (P < .001) and earned more points (P < .001). Being in a group was associated with a larger proportion of weeks in which 150 minutes or more of weekly PA was logged (B = 20.47, P < .001). Use of Walk Georgia was significantly higher among group users than among individual users. To expand use and dissemination of online tracking of PA, programs should target naturally occurring groups (eg, workplaces, schools, faith-based groups).
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Konrad, Stephanie; Paduraru, Peggy; Romero-Barrios, Pablo; Henderson, Sarah B; Galanis, Eleni
2017-08-31
Vibrio parahaemolyticus (Vp) is a naturally occurring bacterium found in marine environments worldwide. It can cause gastrointestinal illness in humans, primarily through raw oyster consumption. Water temperatures, and potentially other environmental factors, play an important role in the growth and proliferation of Vp in the environment. Quantifying the relationships between environmental variables and indicators or incidence of Vp illness is valuable for public health surveillance to inform and enable suitable preventative measures. This study aimed to assess the relationship between environmental parameters and Vp in British Columbia (BC), Canada. The study used Vp counts in oyster meat from 2002-2015 and laboratory confirmed Vp illnesses from 2011-2015 for the province of BC. The data were matched to environmental parameters from publicly available sources, including remote sensing measurements of nighttime sea surface temperature (SST) obtained from satellite readings at a spatial resolution of 1 km. Using three separate models, this paper assessed the relationship between (1) daily SST and Vp counts in oyster meat, (2) weekly mean Vp counts in oysters and weekly Vp illnesses, and (3) weekly mean SST and weekly Vp illnesses. The effects of salinity and chlorophyll a were also evaluated. Linear regression was used to quantify the relationship between SST and Vp, and piecewise regression was used to identify SST thresholds of concern. A total of 2327 oyster samples and 293 laboratory confirmed illnesses were included. In model 1, both SST and salinity were significant predictors of log(Vp) counts in oyster meat. In model 2, the mean log(Vp) count in oyster meat was a significant predictor of Vp illnesses. In model 3, weekly mean SST was a significant predictor of weekly Vp illnesses. The piecewise regression models identified a SST threshold of approximately 14 o C for both model 1 and 3, indicating increased risk of Vp in oyster meat and Vp illnesses at higher temperatures. Monitoring of SST, particularly through readily accessible remote sensing data, could serve as a warning signal for Vp and help inform the introduction and cessation of preventative or control measures.
Posterior propriety for hierarchical models with log-likelihoods that have norm bounds
Michalak, Sarah E.; Morris, Carl N.
2015-07-17
Statisticians often use improper priors to express ignorance or to provide good frequency properties, requiring that posterior propriety be verified. Our paper addresses generalized linear mixed models, GLMMs, when Level I parameters have Normal distributions, with many commonly-used hyperpriors. It provides easy-to-verify sufficient posterior propriety conditions based on dimensions, matrix ranks, and exponentiated norm bounds, ENBs, for the Level I likelihood. Since many familiar likelihoods have ENBs, which is often verifiable via log-concavity and MLE finiteness, our novel use of ENBs permits unification of posterior propriety results and posterior MGF/moment results for many useful Level I distributions, including those commonlymore » used with multilevel generalized linear models, e.g., GLMMs and hierarchical generalized linear models, HGLMs. Furthermore, those who need to verify existence of posterior distributions or of posterior MGFs/moments for a multilevel generalized linear model given a proper or improper multivariate F prior as in Section 1 should find the required results in Sections 1 and 2 and Theorem 3 (GLMMs), Theorem 4 (HGLMs), or Theorem 5 (posterior MGFs/moments).« less
NASA Astrophysics Data System (ADS)
Pan, Chengbin; Miranda, Enrique; Villena, Marco A.; Xiao, Na; Jing, Xu; Xie, Xiaoming; Wu, Tianru; Hui, Fei; Shi, Yuanyuan; Lanza, Mario
2017-06-01
Despite the enormous interest raised by graphene and related materials, recent global concern about their real usefulness in industry has raised, as there is a preoccupying lack of 2D materials based electronic devices in the market. Moreover, analytical tools capable of describing and predicting the behavior of the devices (which are necessary before facing mass production) are very scarce. In this work we synthesize a resistive random access memory (RRAM) using graphene/hexagonal-boron-nitride/graphene (G/h-BN/G) van der Waals structures, and we develop a compact model that accurately describes its functioning. The devices were fabricated using scalable methods (i.e. CVD for material growth and shadow mask for electrode patterning), and they show reproducible resistive switching (RS). The measured characteristics during the forming, set and reset processes were fitted using the model developed. The model is based on the nonlinear Landauer approach for mesoscopic conductors, in this case atomic-sized filaments formed within the 2D materials system. Besides providing excellent overall fitting results (which have been corroborated in log-log, log-linear and linear-linear plots), the model is able to explain the dispersion of the data obtained from cycle-to-cycle in terms of the particular features of the filamentary paths, mainly their confinement potential barrier height.
Suárez-Ortegón, M F; Arbeláez, A; Mosquera, M; Méndez, F; Aguilar-de Plata, C
2012-08-01
Ferritin levels have been associated with metabolic syndrome and insulin resistance. The aim of the present study was to evaluate the prediction of ferritin levels by variables related to cardiometabolic disease risk in a multivariate analysis. For this aim, 123 healthy women (72 premenopausal and 51 posmenopausal) were recruited. Data were collected through procedures of anthropometric measurements, questionnaires for personal/familial antecedents, and dietary intake (24-h recall), and biochemical determinations (ferritin, C reactive protein (CRP), glucose, insulin, and lipid profile) in blood serum samples obtained. Multiple linear regression analysis was used and variables with no normal distribution were log-transformed for this analysis. In premenopausal women, a model to explain log-ferritin levels was found with log-CRP levels, heart attack familial history, and waist circumference as independent predictors. Ferritin behaves as other cardiovascular markers in terms of prediction of its levels by documented predictors of cardiometabolic disease and related disorders. This is the first report of a relationship between heart attack familial history and ferritin levels. Further research is required to evaluate the mechanism to explain the relationship of central body fat and heart attack familial history with body iron stores values.
Inactivation of Mycobacterium avium subsp. paratuberculosis during cooking of hamburger patties.
Hammer, Philipp; Walte, Hans-Georg C; Matzen, Sönke; Hensel, Jann; Kiesner, Christian
2013-07-01
The role of Mycobacterium avium subsp. paratuberculosis (MAP) in Crohn's disease in humans has been debated for many years. Milk and milk products have been suggested as possible vectors for transmission since the beginning of this debate, whereas recent publications show that slaughtered cattle and their carcasses, meat, and organs can also serve as reservoirs for MAP transmission. The objective of this study was to generate heat-inactivation data for MAP during the cooking of hamburger patties. Hamburger patties of lean ground beef weighing 70 and 50 g were cooked for 6, 5, 4, 3, and 2 min, which were sterilized by irradiation and spiked with three different MAP strains at levels between 10² and 10⁶ CFU/ml. Single-sided cooking with one flip was applied, and the temperatures within the patties were recorded by seven thermocouples. Counting of the surviving bacteria was performed by direct plating onto Herrold's egg yolk medium and a three-vial most-probable-number method by using modified Dubos medium. There was considerable variability in temperature throughout the patties during frying. In addition, the log reduction in MAP numbers showed strong variations. In patties weighing 70 g, considerable bacterial reduction of 4 log or larger could only be achieved after 6 min of cooking. For all other cooking times, the bacterial reduction was less than 2 log. Patties weighing 50 g showed a 5-log or larger reduction after cooking times of 5 and 6 min. To determine the inactivation kinetics, a log-linear regression model was used, showing a constant decrease of MAP numbers over cooking time.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Aircraft Airframe Cost Estimation Using a Random Coefficients Model
1979-12-01
approach will also be used here. 2 Model Formulation Several different types of equations could be used for the basic form of the CER, such as linear ...5) Marcotte developed several CER’s for fighter aircraft airframes using the log- linear model . A plot of the residuals from the CER for recurring...of the natural logarithm. Ordinary Least Squares The ordinary least squares procedure starts with the equation for the general linear model . The
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
Categorical Data Analysis Using a Skewed Weibull Regression Model
NASA Astrophysics Data System (ADS)
Caron, Renault; Sinha, Debajyoti; Dey, Dipak; Polpo, Adriano
2018-03-01
In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in details. The analysis of two data sets to show the efficiency of the proposed model is performed.
Kinetics of Hydrothermal Inactivation of Endotoxins ▿
Li, Lixiong; Wilbur, Chris L.; Mintz, Kathryn L.
2011-01-01
A kinetic model was established for the inactivation of endotoxins in water at temperatures ranging from 210°C to 270°C and a pressure of 6.2 × 106 Pa. Data were generated using a bench scale continuous-flow reactor system to process feed water spiked with endotoxin standard (Escherichia coli O113:H10). Product water samples were collected and quantified by the Limulus amebocyte lysate assay. At 250°C, 5-log endotoxin inactivation was achieved in about 1 s of exposure, followed by a lower inactivation rate. This non-log-linear pattern is similar to reported trends in microbial survival curves. Predictions and parameters of several non-log-linear models are presented. In the fast-reaction zone (3- to 5-log reduction), the Arrhenius rate constant fits well at temperatures ranging from 120°C to 250°C on the basis of data from this work and the literature. Both biphasic and modified Weibull models are comparable to account for both the high and low rates of inactivation in terms of prediction accuracy and the number of parameters used. A unified representation of thermal resistance curves for a 3-log reduction and a 3 D value associated with endotoxin inactivation and microbial survival, respectively, is presented. PMID:21193667
Serum Spot 14 concentration is negatively associated with thyroid-stimulating hormone level
Chen, Yen-Ting; Tseng, Fen-Yu; Chen, Pei-Lung; Chi, Yu-Chao; Han, Der-Sheng; Yang, Wei-Shiung
2016-01-01
Abstract Spot 14 (S14) is a protein involved in fatty acid synthesis and was shown to be induced by thyroid hormone in rat liver. However, the presence of S14 in human serum and its relations with thyroid function status have not been investigated. The objectives of this study were to compare serum S14 concentrations in patients with hyperthyroidism or euthyroidism and to evaluate the associations between serum S14 and free thyroxine (fT4) or thyroid-stimulating hormone (TSH) levels. We set up an immunoassay for human serum S14 concentrations and compared its levels between hyperthyroid and euthyroid subjects. Twenty-six hyperthyroid patients and 29 euthyroid individuals were recruited. Data of all patients were pooled for the analysis of the associations between the levels of S14 and fT4, TSH, or quartile of TSH. The hyperthyroid patients had significantly higher serum S14 levels than the euthyroid subjects (median [Q1, Q3]: 975 [669, 1612] ng/mL vs 436 [347, 638] ng/mL, P < 0.001). In univariate linear regression, the log-transformed S14 level (logS14) was positively associated with fT4 but negatively associated with creatinine (Cre), total cholesterol (T-C), triglyceride (TG), low-density lipoprotein cholesterol (LDL-C), and TSH. The positive associations between logS14 and fT4 and the negative associations between logS14 and Cre, TG, T-C, or TSH remained significant after adjustment with sex and age. These associations were prominent in females but not in males. The logS14 levels were negatively associated with the TSH levels grouped by quartile (ß = −0.3020, P < 0.001). The association between logS14 and TSH quartile persisted after adjustment with sex and age (ß = −0.2828, P = 0.001). In stepwise multivariate regression analysis, only TSH grouped by quartile remained significantly associated with logS14 level. We developed an ELISA to measure serum S14 levels in human. Female patients with hyperthyroidism had higher serum S14 levels than the female subjects with euthyroidism. The serum logS14 concentrations were negatively associated with TSH levels. Changes of serum S14 level in the whole thyroid function spectrum deserve further investigation. PMID:27749565
Passino, Dora R.M.; Hickey, James P.; Frank, Anthony M.
1988-01-01
In the Laurentian Great Lakes, more than 300 contaminants have been identified in fish, other biota, water, and sediment. Current hazard assessment of these chemicals by the National Fisheries Research Center-Great Lakes is based on their toxicity, occurrence in the environment, and source. Although scientists at the Center have tested over 70 chemicals with the crustacean Daphnia pulex, the number of experimental data needed to screen the huge array of chemicals in the Great Lakes exceeds the practical capabilities of conducting bioassays. This limitation can be partly circumvented, however, by using mathematical models based on quantitative structure-activity relationships (QSAR) to provide rapid, inexpensive estimates of toxicity. Many properties of chemicals, including toxicity, bioaccumulation and water solubility are well correlated and can be predicted by equations of the generalized linear solvation energy relationships (LSER). The equation we used to model solute toxicity is Toxicity = constant + mVI/100 + s (π* + dδ) + bβm + aαm where VI = intrinsic (Van der Waals) molar volume; π* = molecular dipolarity/polarizability; δ = polarizability 'correction term'; βm = solute hydrogen bond acceptor basicity; and αm = solute hydrogen bond donor acidity. The subscript m designates solute monomer values for α and β. We applied the LSER model to 48-h acute toxicity data (measured as immobilization) for six classes of chemicals detected in Great Lakes fish. The following regression was obtained for Daphnia pulex (concentration = μM): log EC50 = 4.86 - 4.35 VI/100; N = 38, r2 = 0.867, sd = 0.403 We also used the LSER modeling approach to analyze to a large published data set of 24-h acute toxicity for Daphnia magna; the following regression resulted, for eight classes of compounds (concentration = mM): log EC50 = 3.88 - 4.52 VI/100 - 1.62 π* + 1.66 βm - 0.916 αm; N = 62, r2 = 0.859, sd = 0.375 In addition we developed computer software that identifies chemical structures, estimates the LSER parameters, and predicts toxicity. The LSER models promise to be effective in differentiating between reactive and nonreactive toxicity behavior where other models have failed. Contaminants with reactive behavior are generally the most toxic and rank highest in hazard assessment of environmental chemicals.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Poverty and prevalence of antimicrobial resistance in invasive isolates.
Alvarez-Uria, Gerardo; Gandra, Sumanth; Laxminarayan, Ramanan
2016-11-01
To evaluate the association between the income status of a country and the prevalence of antimicrobial resistance (AMR) in the three most common bacteria causing infections in hospitals and in the community: third-generation cephalosporin (3GC)-resistant Escherichia coli, methicillin-resistant Staphylococcus aureus (MRSA), and 3GC-resistant Klebsiella species. Using 2013-2014 country-specific data from the ResistanceMap repository and the World Bank, the association between the prevalence of AMR in invasive samples and the gross national income (GNI) per capita was investigated through linear regression with robust standard errors. To account for non-linear association with the dependent variable, GNI per capita was log-transformed. The models predicted an 11.3% (95% confidence interval (CI) 6.5-16.2%), 18.2% (95% CI 11-25.5%), and 12.3% (95% CI 5.5-19.1%) decrease in the prevalence of 3GC-resistant E. coli, 3GC-resistant Klebsiella species, and MRSA, respectively, for each log GNI per capita. The association was stronger for 3GC-resistant E. coli and Klebsiella species than for MRSA. A significant negative association between GNI per capita and the prevalence of MRSA and 3GC-resistant E. coli and Klebsiella species was found. These results underscore the urgent need for new policies aimed at reducing AMR in resource-poor settings. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Mathematical modeling of tetrahydroimidazole benzodiazepine-1-one derivatives as an anti HIV agent
NASA Astrophysics Data System (ADS)
Ojha, Lokendra Kumar
2017-07-01
The goal of the present work is the study of drug receptor interaction via QSAR (Quantitative Structure-Activity Relationship) analysis for 89 set of TIBO (Tetrahydroimidazole Benzodiazepine-1-one) derivatives. MLR (Multiple Linear Regression) method is utilized to generate predictive models of quantitative structure-activity relationships between a set of molecular descriptors and biological activity (IC50). The best QSAR model was selected having a correlation coefficient (r) of 0.9299 and Standard Error of Estimation (SEE) of 0.5022, Fisher Ratio (F) of 159.822 and Quality factor (Q) of 1.852. This model is statistically significant and strongly favours the substitution of sulphur atom, IS i.e. indicator parameter for -Z position of the TIBO derivatives. Two other parameter logP (octanol-water partition coefficient) and SAG (Surface Area Grid) also played a vital role in the generation of best QSAR model. All three descriptor shows very good stability towards data variation in leave-one-out (LOO).
Madison, Matthew J; Bradshaw, Laine P
2015-06-01
Diagnostic classification models are psychometric models that aim to classify examinees according to their mastery or non-mastery of specified latent characteristics. These models are well-suited for providing diagnostic feedback on educational assessments because of their practical efficiency and increased reliability when compared with other multidimensional measurement models. A priori specifications of which latent characteristics or attributes are measured by each item are a core element of the diagnostic assessment design. This item-attribute alignment, expressed in a Q-matrix, precedes and supports any inference resulting from the application of the diagnostic classification model. This study investigates the effects of Q-matrix design on classification accuracy for the log-linear cognitive diagnosis model. Results indicate that classification accuracy, reliability, and convergence rates improve when the Q-matrix contains isolated information from each measured attribute.
Apalasamy, Yamunah Devi; Ming, Moy Foong; Rampal, Sanjay; Bulgiba, Awang; Mohamed, Zahurin
2015-03-01
Recent findings have shown that the rs1042714 (Gln27Glu) single-nucleotide polymorphism (SNP) on the β2-adrenoceptor gene may predispose to obesity. The findings from other studies carried on different populations, however, have been inconsistent. The authors investigated the association between the rs1042714 SNP with obesity-related parameters. DNA of 672 Malaysian Malays was analyzed using real-time polymerase chain reaction. Univariate and multivariate linear regression analyses revealed significant associations between rs1042714 and diastolic blood pressure in the pooled Malaysian Malay subjects under additive and recessive models. After gender stratification, however, a significant association was found between the rs1042714 and triglyceride and the rs1042714 and log-transformed high-density lipoprotein cholesterol levels in Malaysian Malay men. No significant association was found between the SNP and log-transformed body mass index. This polymorphism may have an important role in the development of obesity-related traits in Malaysian Malays. Gender is an effect modifier for the effect of the rs1042714 polymorphism on obesity-related traits in Malaysian Malays. © 2011 APJPH.
Log-Linear Model Based Behavior Selection Method for Artificial Fish Swarm Algorithm
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm. PMID:25691895
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Modeling surficial sand and gravel deposits
Bliss, J.D.; Page, N.J.
1994-01-01
Mineral-deposit models are an integral part of quantitative mineral-resource assessment. As the focus of mineral-deposit modeling has moved from metals to industrial minerals, procedure has been modified and may be sufficient to model surficial sand and gravel deposits. Sand and gravel models are needed to assess resource-supply analyses for planning future development and renewal of infrastructure. Successful modeling of sand and gravel deposits must address (1) deposit volumes and geometries, (2) sizes of fragments within the deposits, (3) physical characteristics of the material, and (4) chemical composition and chemical reactivity of the material. Several models of sand and gravel volumes and geometries have been prepared and suggest the following: Sand and gravel deposits in alluvial fans have a median volume of 35 million m3. Deposits in all other geologic settings have a median volume of 5.4 million m3, a median area of 120 ha, and a median thickness of 4 m. The area of a sand and gravel deposit can be predicted from volume using a regression model (log [area (ha)] =1.47+0.79 log [volume (million m3)]). In similar fashion, the volume of a sand and gravel deposit can be predicted from area using the regression (log [volume (million m3)]=-1.45+1.07 log [area (ha)]). Classifying deposits by fragment size can be done using models of the percentage of sand, gravel, and silt within deposits. A classification scheme based on fragment size is sufficiently general to be applied anywhere. ?? 1994 Oxford University Press.
Tolls, Johannes; Müller, Martin; Willing, Andreas; Steber, Josef
2009-07-01
Many consumer products contain lipophilic, poorly soluble ingredients representing large-volume substances whose aquatic toxicity cannot be adequately determined with standard methods for a number of reasons. In such cases, a recently developed approach can be used to define an aquatic exposure threshold of no concern (ETNCaq; i.e., a concentration below which no adverse affects on the environment are to be expected). A risk assessment can be performed by comparing the ETNCaq value with the aquatic exposure levels of poorly soluble substances. Accordingly, the aquatic exposure levels of substances with water solubility below the ETNCaq will not exceed the ecotoxicological no-effect concentration; therefore, their risk can be assessed as being negligible. The ETNCaq value relevant for substances with a narcotic mode of action is 1.9 microg/L. To apply the above risk assessment strategy, the solubility in water needs to be known. Most frequently, this parameter is estimated by means of quantitative structure/activity relationships based on the log octanol-water partition coefficient (log Kow). The predictive value of several calculation models for water solubility has been investigated by this method with the use of more recent experimental solubility data for lipophilic compounds. A linear regression model was shown to be the most suitable for providing correct predictions without underestimation of real water solubility. To define a log Kow threshold suitable for reliably predicting a water solubility of less than 1.9 microg/L, a confidence limit was established by statistical comparison of the experimental solubility data with their log Kow. It was found that a threshold of log Kow = 7 generally allows discrimination between substances with solubility greater than and less than 1.9 microg/L. Accordingly, organic substances with a baseline toxicity and log Kow > 7 do not require further testing to prove that they have low environmental risk. In applying this concept, the uncertainty of the prediction of water solubility can be accounted for. If the predicted solubility in water is to be below ETNCaq with a probability of 95%, the corresponding log Kow value is 8.
Assessment of passive drag in swimming by numerical simulation and analytical procedure.
Barbosa, Tiago M; Ramos, Rui; Silva, António J; Marinho, Daniel A
2018-03-01
The aim was to compare the passive drag-gliding underwater by a numerical simulation and an analytical procedure. An Olympic swimmer was scanned by computer tomography and modelled gliding at a 0.75-m depth in the streamlined position. Steady-state computer fluid dynamics (CFD) analyses were performed on Fluent. A set of analytical procedures was selected concurrently. Friction drag (D f ), pressure drag (D pr ), total passive drag force (D f +pr ) and drag coefficient (C D ) were computed between 1.3 and 2.5 m · s -1 by both techniques. D f +pr ranged from 45.44 to 144.06 N with CFD, from 46.03 to 167.06 N with the analytical procedure (differences: from 1.28% to 13.77%). C D ranged between 0.698 and 0.622 by CFD, 0.657 and 0.644 by analytical procedures (differences: 0.40-6.30%). Linear regression models showed a very high association for D f +pr plotted in absolute values (R 2 = 0.98) and after log-log transformation (R 2 = 0.99). The C D also obtained a very high adjustment for both absolute (R 2 = 0.97) and log-log plots (R 2 = 0.97). The bias for the D f +pr was 8.37 N and 0.076 N after logarithmic transformation. D f represented between 15.97% and 18.82% of the D f +pr by the CFD, 14.66% and 16.21% by the analytical procedures. Therefore, despite the bias, analytical procedures offer a feasible way of gathering insight on one's hydrodynamics characteristics.
Shoeib, Mahiba; Harner, Tom
2002-05-01
Octanol-air partition coefficients (Koa) were measured directly for 19 organochlorine (OC) pesticides over the temperature range of 5 to 35 degrees C. Values of log Koa at 25 degrees C ranged over three orders of magnitude, from 7.4 for hexachlorobenzene to 10.1 for 1,1-dichloro-2,2-bis(p-chlorophenyl) ethane. Measured values were compared to values calculated as KowRT/H (where R is the ideal gas constant [8.314 J mol(-1) K(-1)], T is absolute temperature, and H is Henry's law constant) were, in general, larger. Discrepancies of up to three orders of magnitude were observed, highlighting the need for direct measurements of Koa. Plots of Koa versus inverse absolute temperature exhibited a log-linear correlation. Enthalpies of phase transition between octanol and air (deltaHoa) were determined from the temperature slopes and were in the range of 56 to 105 kJ mol(-1) K(-1). Activity coefficients in octanol (gamma(o)) were determined from Koa and reported supercooled liquid vapor pressures (pL(o)), and these were in the range of 0.3 to 12, indicating near-ideal solution behavior. Differences in Koa values for structural isomers of hexachlorocyclohexane were also explored. A Koa-based model was described for predicting the partitioning of OC pesticides to aerosols and used to calculate particulate fractions at 25 and -10 degrees C. The model also agreed well with experimental results for several OC pesticides that were equilibrated with urban aerosols in the laboratory. A log-log regression of the particle-gas partition coefficient versus Koa had a slope near unity, indicating that octanol is a good surrogate for the aerosol organic matter.
Effect of temperature and humidity on formaldehyde emissions in temporary housing units.
Parthasarathy, Srinandini; Maddalena, Randy L; Russell, Marion L; Apte, Michael G
2011-06-01
The effect of temperature and humidity on formaldehyde emissions from samples collected from temporary housing units (THUs) was studied. The THUs were supplied by the U.S. Federal Emergency Management Administration (FEMA) to families that lost their homes in Louisiana and Mississippi during the Hurricane Katrina and Rita disasters. On the basis of a previous study, four of the composite wood surface materials that dominated contributions to indoor formaldehyde were selected to analyze the effects of temperature and humidity on the emission factors. Humidity equilibration experiments were carried out on two of the samples to determine how long the samples take to equilibrate with the surrounding environmental conditions. Small chamber experiments were then conducted to measure emission factors for the four surface materials at various temperature and humidity conditions. The samples were analyzed for formaldehyde via high-performance liquid chromatography. The experiments showed that increases in temperature or humidity contributed to an increase in emission factors. A linear regression model was built using the natural log of the percent relative humidity (RH) and inverse of temperature (in K) as independent variables and the natural log of emission factors as the dependent variable. The coefficients for the inverse of temperature and log RH with log emission factor were found to be statistically significant for all of the samples at the 95% confidence level. This study should assist in retrospectively estimating indoor formaldehyde exposure of occupants of THUs.
Aldosterone and glomerular filtration – observations in the general population
2014-01-01
Background Increasing evidence suggests that aldosterone promotes renal damage. Since data on the association between aldosterone and renal function in the general population are sparse, we chose to address this issue. We investigated the associations between the plasma aldosterone concentration (PAC) or the aldosterone-to-renin ratio (ARR) and the estimated glomerular filtration rate (eGFR) in a sample of adult men and women from Northeast Germany. Methods A study population of 1921 adult men and women who participated in the first follow-up of the Study of Health in Pomerania was selected. None of the subjects used drugs that alter PAC or ARR. The eGFR was calculated according to the four-variable Modification of Diet in Renal Disease formula. Chronic kidney disease (CKD) was defined as an eGFR <60 ml/min/1.73 m2. Results Linear regression models, adjusted for sex, age, waist circumference, diabetes mellitus, smoking status, systolic and diastolic blood pressures, serum triglyceride concentrations and time of blood sampling revealed inverse associations of PAC or ARR with eGFR (ß-coefficient for log-transformed PAC −3.12, p < 0.001; ß-coefficient for log-transformed ARR −3.36, p < 0.001). Logistic regression models revealed increased odds for CKD with increasing PAC (odds ratio for a one standard deviation increase in PAC: 1.35, 95% confidence interval: 1.06-1.71). There was no statistically significant association between ARR and CKD. Conclusion Our study demonstrates that PAC and ARR are inversely associated with the glomerular filtration rate in the general population. PMID:24612948
"Geo-statistics methods and neural networks in geophysical applications: A case study"
NASA Astrophysics Data System (ADS)
Rodriguez Sandoval, R.; Urrutia Fucugauchi, J.; Ramirez Cruz, L. C.
2008-12-01
The study is focus in the Ebano-Panuco basin of northeastern Mexico, which is being explored for hydrocarbon reservoirs. These reservoirs are in limestones and there is interest in determining porosity and permeability in the carbonate sequences. The porosity maps presented in this study are estimated from application of multiattribute and neural networks techniques, which combine geophysics logs and 3-D seismic data by means of statistical relationships. The multiattribute analysis is a process to predict a volume of any underground petrophysical measurement from well-log and seismic data. The data consist of a series of target logs from wells which tie a 3-D seismic volume. The target logs are neutron porosity logs. From the 3-D seismic volume a series of sample attributes is calculated. The objective of this study is to derive a set of attributes and the target log values. The selected set is determined by a process of forward stepwise regression. The analysis can be linear or nonlinear. In the linear mode the method consists of a series of weights derived by least-square minimization. In the nonlinear mode, a neural network is trained using the select attributes as inputs. In this case we used a probabilistic neural network PNN. The method is applied to a real data set from PEMEX. For better reservoir characterization the porosity distribution was estimated using both techniques. The case shown a continues improvement in the prediction of the porosity from the multiattribute to the neural network analysis. The improvement is in the training and the validation, which are important indicators of the reliability of the results. The neural network showed an improvement in resolution over the multiattribute analysis. The final maps provide more realistic results of the porosity distribution.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Liu, Peter Y; Takahashi, Paul Y; Roebuck, Pamela D; Iranmanesh, Ali; Veldhuis, Johannes D
2005-09-01
Pulsatile and thus total testosterone (Te) secretion declines in older men, albeit for unknown reasons. Analytical models forecast that aging may reduce the capability of endogenous luteinizing hormone (LH) pulses to stimulate Leydig cell steroidogenesis. This notion has been difficult to test experimentally. The present study used graded doses of a selective gonadotropin releasing hormone (GnRH)-receptor antagonist to yield four distinct strata of pulsatile LH release in each of 18 healthy men ages 23-72 yr. Deconvolution analysis was applied to frequently sampled LH and Te concentration time series to quantitate pulsatile Te secretion over a 16-h interval. Log-linear regression was used to relate pulsatile LH secretion to attendant pulsatile Te secretion (LH-Te drive) across the four stepwise interventions in each subject. Linear regression of the 18 individual estimates of LH-Te feedforward dose-response slopes on age disclosed a strongly negative relationship (r = -0.721, P < 0.001). Accordingly, the present data support the thesis that aging in healthy men attenuates amplitude-dependent LH drive of burst-like Te secretion. The experimental strategy of graded suppression of neuroglandular outflow may have utility in estimating dose-response adaptations in other endocrine systems.
NASA Astrophysics Data System (ADS)
Bartiko, Daniel; Chaffe, Pedro; Bonumá, Nadia
2017-04-01
Floods may be strongly affected by climate, land-use, land-cover and water infrastructure changes. However, it is common to model this process as stationary. This approach has been questioned, especially when it involves estimate of the frequency and magnitude of extreme events for designing and maintaining hydraulic structures, as those responsible for flood control and dams safety. Brazil is the third largest producer of hydroelectricity in the world and many of the country's dams are located in the Southern Region. So, it seems appropriate to investigate the presence of non-stationarity in the affluence in these plants. In our study, we used historical flood data from the Brazilian National Grid Operator (ONS) to explore trends in annual maxima in river flow of the 38 main rivers flowing to Southern Brazilian reservoirs (records range from 43 to 84 years). In the analysis, we assumed a two-parameter log-normal distribution a linear regression model was applied in order to allow for the mean to vary with time. We computed recurrence reduction factors to characterize changes in the return period of an initially estimated 100 year-flood by a log-normal stationary model. To evaluate whether or not a particular site exhibits positive trend, we only considered data series with linear regression slope coefficients that exhibit significance levels (p<0,05). The significance level was calculated using the one-sided Student's test. The trend model residuals were analyzed using the Anderson-Darling normality test, the Durbin-Watson test for the independence and the Breusch-Pagan test for heteroscedasticity. Our results showed that 22 of the 38 data series analyzed have a significant positive trend. The trends were mainly in three large basins: Iguazu, Uruguay and Paranapanema, which suffered changes in land use and flow regularization in the last years. The calculated return period for the series that presented positive trend varied from 50 to 77 years for a 100 year-flood estimated by stationary model when considering a planning horizon equal to ten years. We conclude that attention should be given for future projects developed in this area, including the incorporation of non-stationarity analysis, search for answers to such changes and incorporation of new data to increase the reliability of the estimates.
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Correlation of Respirator Fit Measured on Human Subjects and a Static Advanced Headform
Bergman, Michael S.; He, Xinjian; Joseph, Michael E.; Zhuang, Ziqing; Heimbuch, Brian K.; Shaffer, Ronald E.; Choe, Melanie; Wander, Joseph D.
2015-01-01
This study assessed the correlation of N95 filtering face-piece respirator (FFR) fit between a Static Advanced Headform (StAH) and 10 human test subjects. Quantitative fit evaluations were performed on test subjects who made three visits to the laboratory. On each visit, one fit evaluation was performed on eight different FFRs of various model/size variations. Additionally, subject breathing patterns were recorded. Each fit evaluation comprised three two-minute exercises: “Normal Breathing,” “Deep Breathing,” and again “Normal Breathing.” The overall test fit factors (FF) for human tests were recorded. The same respirator samples were later mounted on the StAH and the overall test manikin fit factors (MFF) were assessed utilizing the recorded human breathing patterns. Linear regression was performed on the mean log10-transformed FF and MFF values to assess the relationship between the values obtained from humans and the StAH. This is the first study to report a positive correlation of respirator fit between a headform and test subjects. The linear regression by respirator resulted in R2 = 0.95, indicating a strong linear correlation between FF and MFF. For all respirators the geometric mean (GM) FF values were consistently higher than those of the GM MFF. For 50% of respirators, GM FF and GM MFF values were significantly different between humans and the StAH. For data grouped by subject/respirator combinations, the linear regression resulted in R2 = 0.49. A weaker correlation (R2 = 0.11) was found using only data paired by subject/respirator combination where both the test subject and StAH had passed a real-time leak check before performing the fit evaluation. For six respirators, the difference in passing rates between the StAH and humans was < 20%, while two respirators showed a difference of 29% and 43%. For data by test subject, GM FF and GM MFF values were significantly different for 40% of the subjects. Overall, the advanced headform system has potential for assessing fit for some N95 FFR model/sizes. PMID:25265037
Use of Log-Linear Models in Classification Problems.
1981-12-01
polynomials. The second example involves infant hypoxic trauma, and many cells are empty. The existence conditions are used to find a model for which esti...mates of cell frequencies exist and are in good agreement with the ob- served data. Key Words: Classification problem, log-difference models, minimum 8...variates define k states, which are labeled consecutively. Thus, while MB define cells in their tables by an I-vector Z, we simply take Z to be a
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sang-Kwun; Keener, T.C.; Cook, J.L.
1993-12-31
The experimental data of lime sorbent attrition obtained from attriton tests in a circulating fluidized bed absorber (CFBA) are represented. The results are interpreted as both the weight-based attrition rate and size-based attrition rate. The weight-based attrition rate constants are obtained from a modified second-order attrition model, incorporating a minimum fluidization weight, W{sub min}, and excess velocity. Furthermore, this minimum fluidization weight, or W{sub min} was found to be a function of both particle size and velocity. A plot of the natural log of the overall weight-based attrition rate constants (ln K{sub a}) for Lime 1 (903 MMD) at superficialmore » gas velocities of 2 m/s, 2.35 m/s, and 2.69 m/s and for Lime 2 (1764 MMD) at superficial gas velocities of 2 m/s, 3 m/s, 4 m/s and 5 m/s versus the energy term, 1/(U-U{sub mf}){sup 2}, yielded a linear relationship. And, a regression coefficient of 0.9386 for the linear regression confirms that K{sub a} may be expressed in Arrhenius form. In addition, an unsteady state population model is represented to predict the changes in size distribution of bed materials during fluidization. The unsteady state population model was verified experimentally and the solid size distribution predicted by the model agreed well with the corresponding experimental size distributions. The model may be applicable for the batch and continuous operations of fluidized beds in which the solids size reduction is predominantly resulted from attritions and elutriations. Such significance of the mechanical attrition and elutriation is frequently seen in a fast fluidized bed as well as in a circulating fluidized bed.« less
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Body burden levels of dioxin, furans, and PCBs among frequent consumers of Great Lakes sport fish
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falk, C.; Hanrahan, L.; Anderson, H.A.
1999-02-01
Dioxins, furans, and polychlorinated biphenyls (PCBs) are toxic, persist in the environment, and bioaccumulate to concentrations that can be harmful to humans. The Health Departments of five GL states, Wisconsin, Michigan, Ohio, Illinois, and Indiana, formed a consortium to study body burden levels of chemical residues in fish consumers of Lakes Michigan, Huron, and Erie. In Fall 1993, a telephone survey was administered to sport angler households to obtain fish consumption habits and demographics. A blood sample was obtained from a portion of the study subjects. One hundred serum samples were analyzed for 8 dioxin, 10 furan, and 4 coplanarmore » PCB congeners. Multiple linear regression was conducted to assess the predictability of the following covariates: GL sport fish species, age, BMI, gender, years sport fish consumed, and lake. Median total dioxin toxic equivalents (TEq), total furan TEq, and total coplanar PCB TEq were higher among all men than all women (P = 0.0001). Lake trout, salmon, age, BMI, and gender were significant regression predictors of log (total coplanar PCBs). Lake trout, age, gender, and lake were significant regression predictors of log (total furans). Age was the only significant predictor of total dioxin levels.« less
Ferrarini, Luca; Veer, Ilya M; van Lew, Baldur; Oei, Nicole Y L; van Buchem, Mark A; Reiber, Johan H C; Rombouts, Serge A R B; Milles, J
2011-06-01
In recent years, graph theory has been successfully applied to study functional and anatomical connectivity networks in the human brain. Most of these networks have shown small-world topological characteristics: high efficiency in long distance communication between nodes, combined with highly interconnected local clusters of nodes. Moreover, functional studies performed at high resolutions have presented convincing evidence that resting-state functional connectivity networks exhibits (exponentially truncated) scale-free behavior. Such evidence, however, was mostly presented qualitatively, in terms of linear regressions of the degree distributions on log-log plots. Even when quantitative measures were given, these were usually limited to the r(2) correlation coefficient. However, the r(2) statistic is not an optimal estimator of explained variance, when dealing with (truncated) power-law models. Recent developments in statistics have introduced new non-parametric approaches, based on the Kolmogorov-Smirnov test, for the problem of model selection. In this work, we have built on this idea to statistically tackle the issue of model selection for the degree distribution of functional connectivity at rest. The analysis, performed at voxel level and in a subject-specific fashion, confirmed the superiority of a truncated power-law model, showing high consistency across subjects. Moreover, the most highly connected voxels were found to be consistently part of the default mode network. Our results provide statistically sound support to the evidence previously presented in literature for a truncated power-law model of resting-state functional connectivity. Copyright © 2010 Elsevier Inc. All rights reserved.
Development of quantitative screen for 1550 chemicals with GC-MS.
Bergmann, Alan J; Points, Gary L; Scott, Richard P; Wilson, Glenn; Anderson, Kim A
2018-05-01
With hundreds of thousands of chemicals in the environment, effective monitoring requires high-throughput analytical techniques. This paper presents a quantitative screening method for 1550 chemicals based on statistical modeling of responses with identification and integration performed using deconvolution reporting software. The method was evaluated with representative environmental samples. We tested biological extracts, low-density polyethylene, and silicone passive sampling devices spiked with known concentrations of 196 representative chemicals. A multiple linear regression (R 2 = 0.80) was developed with molecular weight, logP, polar surface area, and fractional ion abundance to predict chemical responses within a factor of 2.5. Linearity beyond the calibration had R 2 > 0.97 for three orders of magnitude. Median limits of quantitation were estimated to be 201 pg/μL (1.9× standard deviation). The number of detected chemicals and the accuracy of quantitation were similar for environmental samples and standard solutions. To our knowledge, this is the most precise method for the largest number of semi-volatile organic chemicals lacking authentic standards. Accessible instrumentation and software make this method cost effective in quantifying a large, customizable list of chemicals. When paired with silicone wristband passive samplers, this quantitative screen will be very useful for epidemiology where binning of concentrations is common. Graphical abstract A multiple linear regression of chemical responses measured with GC-MS allowed quantitation of 1550 chemicals in samples such as silicone wristbands.
Thermal inactivation of Salmonella spp. in pork burger patties.
Gurman, P M; Ross, T; Holds, G L; Jarrett, R G; Kiermeier, A
2016-02-16
Predictive models, to estimate the reduction in Escherichia coli O157:H7 concentration in beef burgers, have been developed to inform risk management decisions; no analogous model exists for Salmonella spp. in pork burgers. In this study, "Extra Lean" and "Regular" fat pork minces were inoculated with Salmonella spp. (Salmonella 4,[5],12,i:-, Salmonella Senftenberg and Salmonella Typhimurium) and formed into pork burger patties. Patties were cooked on an electric skillet (to imitate home cooking) to one of seven internal temperatures (46, 49, 52, 55, 58, 61, 64 °C) and Salmonella enumerated. A generalised linear logistic regression model was used to develop a predictive model for the Salmonella concentration based on the internal endpoint temperature. It was estimated that in pork mince with a fat content of 6.1%, Salmonella survival will be decreased by -0.2407log10 CFU/g for a 1 °C increase in internal endpoint temperature, with a 5-log10 reduction in Salmonella concentration estimated to occur when the geometric centre temperature reaches 63 °C. The fat content influenced the rate of Salmonella inactivation (P=0.043), with Salmonella survival increasing as fat content increased, though this effect became negligible as the temperature approached 62 °C. Fat content increased the time required for patties to achieve a specified internal temperature (P=0.0106 and 0.0309 for linear and quadratic terms respectively), indicating that reduced fat pork mince may reduce the risk of salmonellosis from consumption of pork burgers. Salmonella serovar did not significantly affect the model intercepts (P=0.86) or slopes (P=0.10) of the fitted logistic curve. This predictive model can be applied to estimate the reduction in Salmonella in pork burgers after cooking to a specific endpoint temperature and hence to assess food safety risk. Crown Copyright © 2015. Published by Elsevier B.V. All rights reserved.
Virji, M. Abbas; Trapnell, Bruce C.; Carey, Brenna; Healey, Terrance; Kreiss, Kathleen
2014-01-01
Rationale: Occupational exposure to indium compounds, including indium–tin oxide, can result in potentially fatal indium lung disease. However, the early effects of exposure on the lungs are not well understood. Objectives: To determine the relationship between short-term occupational exposures to indium compounds and the development of early lung abnormalities. Methods: Among indium–tin oxide production and reclamation facility workers, we measured plasma indium, respiratory symptoms, pulmonary function, chest computed tomography, and serum biomarkers of lung disease. Relationships between plasma indium concentration and health outcome variables were evaluated using restricted cubic spline and linear regression models. Measurements and Main Results: Eighty-seven (93%) of 94 indium–tin oxide facility workers (median tenure, 2 yr; median plasma indium, 1.0 μg/l) participated in the study. Spirometric abnormalities were not increased compared with the general population, and few subjects had radiographic evidence of alveolar proteinosis (n = 0), fibrosis (n = 2), or emphysema (n = 4). However, in internal comparisons, participants with plasma indium concentrations ≥ 1.0 μg/l had more dyspnea, lower mean FEV1 and FVC, and higher median serum Krebs von den Lungen-6 and surfactant protein-D levels. Spline regression demonstrated nonlinear exposure response, with significant differences occurring at plasma indium concentrations as low as 1.0 μg/l compared with the reference. Associations between health outcomes and the natural log of plasma indium concentration were evident in linear regression models. Associations were not explained by age, smoking status, facility tenure, or prior occupational exposures. Conclusions: In indium–tin oxide facility workers with short-term, low-level exposure, plasma indium concentrations lower than previously reported were associated with lung symptoms, decreased spirometric parameters, and increased serum biomarkers of lung disease. PMID:25295756
Serum Vitamin D Levels and Markers of Severity of Childhood Asthma in Costa Rica
Brehm, John M.; Celedón, Juan C.; Soto-Quiros, Manuel E.; Avila, Lydiana; Hunninghake, Gary M.; Forno, Erick; Laskey, Daniel; Sylvia, Jody S.; Hollis, Bruce W.; Weiss, Scott T.; Litonjua, Augusto A.
2009-01-01
Rationale: Maternal vitamin D intake during pregnancy has been inversely associated with asthma symptoms in early childhood. However, no study has examined the relationship between measured vitamin D levels and markers of asthma severity in childhood. Objectives: To determine the relationship between measured vitamin D levels and both markers of asthma severity and allergy in childhood. Methods: We examined the relation between 25-hydroxyvitamin D levels (the major circulating form of vitamin D) and markers of allergy and asthma severity in a cross-sectional study of 616 Costa Rican children between the ages of 6 and 14 years. Linear, logistic, and negative binomial regressions were used for the univariate and multivariate analyses. Measurements and Main Results: Of the 616 children with asthma, 175 (28%) had insufficient levels of vitamin D (<30 ng/ml). In multivariate linear regression models, vitamin D levels were significantly and inversely associated with total IgE and eosinophil count. In multivariate logistic regression models, a log10 unit increase in vitamin D levels was associated with reduced odds of any hospitalization in the previous year (odds ratio [OR], 0.05; 95% confidence interval [CI], 0.004–0.71; P = 0.03), any use of antiinflammatory medications in the previous year (OR, 0.18; 95% CI, 0.05–0.67; P = 0.01), and increased airway responsiveness (a ≤8.58-μmol provocative dose of methacholine producing a 20% fall in baseline FEV1 [OR, 0.15; 95% CI, 0.024–0.97; P = 0.05]). Conclusions: Our results suggest that vitamin D insufficiency is relatively frequent in an equatorial population of children with asthma. In these children, lower vitamin D levels are associated with increased markers of allergy and asthma severity. PMID:19179486
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
Grantz, Erin; Haggard, Brian; Scott, J Thad
2018-06-12
We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
NASA Technical Reports Server (NTRS)
Asner, Gregory P.; Keller, Michael M.; Silva, Jose Natalino; Zweede, Johan C.; Pereira, Rodrigo, Jr.
2002-01-01
Major uncertainties exist regarding the rate and intensity of logging in tropical forests worldwide: these uncertainties severely limit economic, ecological, and biogeochemical analyses of these regions. Recent sawmill surveys in the Amazon region of Brazil show that the area logged is nearly equal to total area deforested annually, but conversion of survey data to forest area, forest structural damage, and biomass estimates requires multiple assumptions about logging practices. Remote sensing could provide an independent means to monitor logging activity and to estimate the biophysical consequences of this land use. Previous studies have demonstrated that the detection of logging in Amazon forests is difficult and no studies have developed either the quantitative physical basis or remote sensing approaches needed to estimate the effects of various logging regimes on forest structure. A major reason for these limitations has been a lack of sufficient, well-calibrated optical satellite data, which in turn, has impeded the development and use of physically-based, quantitative approaches for detection and structural characterization of forest logging regimes. We propose to use data from the EO-1 Hyperion imaging spectrometer to greatly increase our ability to estimate the presence and structural attributes of selective logging in the Amazon Basin. Our approach is based on four "biogeophysical indicators" not yet derived simultaneously from any satellite sensor: 1) green canopy leaf area index; 2) degree of shadowing; 3) presence of exposed soil and; 4) non-photosynthetic vegetation material. Airborne, field and modeling studies have shown that the optical reflectance continuum (400-2500 nm) contains sufficient information to derive estimates of each of these indicators. Our ongoing studies in the eastern Amazon basin also suggest that these four indicators are sensitive to logging intensity. Satellite-based estimates of these indicators should provide a means to quantify both the presence and degree of structural disturbance caused by various logging regimes. Our quantitative assessment of Hyperion hyperspectral and ALI multi-spectral data for the detection and structural characterization of selective logging in Amazonia will benefit from data collected through an ongoing project run by the Tropical Forest Foundation, within which we have developed a study of the canopy and landscape biophysics of conventional and reduced-impact logging. We will add to our base of forest structural information in concert with an EO-1 overpass. Using a photon transport model inversion technique that accounts for non-linear mixing of the four biogeophysical indicators, we will estimate these parameters across a gradient of selective logging intensity provided by conventional and reduced impact logging sites. We will also compare our physical ly-based approach to both conventional (e.g., NDVI) and novel (e.g., SWIR-channel) vegetation indices as well as to linear mixture modeling methods. We will cross-compare these approaches using Hyperion and ALI imagers to determine the strengths and limitations of these two sensors for applications of forest biophysics. This effort will yield the first physical ly-based, quantitative analysis of the detection and intensity of selective logging in Amazonia, comparing hyperspectral and improved multi-spectral approaches as well as inverse modeling, linear mixture modeling, and vegetation index techniques.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Neary, M; Lamorde, M; Olagunju, A; Darin, K M; Merry, C; Byakika-Kibwika, P; Back, D J; Siccardi, M; Owen, A; Scarsi, K K
2017-09-01
Reduced levonorgestrel concentrations from the levonorgestrel contraceptive implant was previously seen when given concomitantly with efavirenz. We sought to assess whether single nucleotide polymorphisms (SNPs) in genes involved in efavirenz and nevirapine metabolism were linked to these changes in levonorgestrel concentration. SNPs in CYP2B6, CYP2A6, NR1I2, and NR1I3 were analyzed. Associations of participant demographics and genotype with levonorgestrel pharmacokinetics were evaluated in HIV-positive women using the levonorgestrel implant plus efavirenz- or nevirapine-based antiretroviral therapy (ART), in comparison to ART-naïve women using multivariate linear regression. Efavirenz group: CYP2B6 516G>T was associated with lower levonorgestrel log 10 C max and log 10 AUC. CYP2B6 15582C>T was associated with lower log 10 AUC. Nevirapine group: CYP2B6 516G>T was associated with higher log 10 C max and lower log 10 C min . Pharmacogenetic variations influenced subdermal levonorgestrel pharmacokinetics in HIV-positive women, indicating that the magnitude of the interaction with non-nucleoside reverse transcriptase inhibitors (NNRTIs) is influenced by host genetics. © 2017 American Society for Clinical Pharmacology and Therapeutics.
van der Lee, R; Pfaffendorf, M; van Zwieten, P A
2000-11-01
To investigate a possible relationship between the time courses of action of various calcium antagonists and their lipophilicity, characterized as log P-values. The functional experiments were performed in vitro in human small subcutaneous arteries (internal diameter 591 +/- 51 microm, n = 7 for each concentration), obtained from cosmetic surgery (mamma reduction and abdominoplasty). The vessels were investigated in an isometric wire myograph. The vasodilator effect of the calcium antagonists was quantified by means of log IC50-values, and the onset of the vasodilator effect for each concentration studied was expressed as time to Eeq90-values (time to reach 90% of the maximal effect). Log IC50-values were -8.46 +/- 0.09, -8.33 +/- 0.25 and -8.72 +/- 0.16 for nifedipine, felodipine and (S)-lercanidipine, respectively (not significant). On average, nifedipine reached time to Eeq90 in 11 +/- 1 min. For felodipine and (S)-lercanidipine the corresponding values were 60 +/- 11 min and 99 +/- 9 min, respectively. The differences between these values were statistically significant (P< 0.01). In spite of these differences in the in-vitro human vascular model, the three calcium antagonists are equipotent with regard to their vasodilator effects. Linear regression analysis of the correlation between the logarithm of the membrane partition coefficient (log P-values) of the calcium antagonists tested [2.50, 4.46 and 6.88 for nifedipine, felodipine and (S)-lercanidipine, respectively] and their respective values found for time to Eeq90 was highly significant. It appears that a higher log P-value is correlated with a slower onset of action.
Enumeration of verocytotoxigenic Escherichia coli (VTEC) O157 and O26 in milk by quantitative PCR.
Mancusi, Rocco; Trevisani, Marcello
2014-08-01
Quantitative real-time polymerase chain reaction (qPCR) can be a convenient alternative to the Most Probable Number (MPN) methods to count VTEC in milk. The number of VTEC is normally very low in milk; therefore with the aim of increasing the method sensitivity a qPCR protocol that relies on preliminary enrichment was developed. The growth pattern of six VTEC strains (serogroups O157 and O26) was studied using enrichment in Buffered Peptone Water (BPW) with or without acriflavine for 4-24h. Milk samples were inoculated with these strains over a five Log concentration range between 0.24-0.50 and 4.24-4.50 Log CFU/ml. DNA was extracted from the enriched samples in duplicate and each extract was analysed in duplicate by qPCR using pairs of primers specific for the serogroups O157 and O26. When samples were pre-enriched in BPW at 37°C for 8h, the relationship between threshold cycles (CT values) and VTEC Log numbers was linear over a five Log concentration range. The regression of PCR threshold cycle numbers on VTEC Log CFU/ml had a slope coefficient equal to -3.10 (R(2)=0.96) which is indicative of a 10-fold difference of the gene copy numbers between samples (with a 100 ± 10% PCR efficiency). The same 10-fold proportion used for inoculating the milk samples with VTEC was observed, therefore, also in the enriched samples at 8h. A comparison of the CT values of milk samples and controls revealed that the strains inoculated in milk grew with 3 Log increments in the 8h enrichment period. Regression lines that fitted the qPCR and MPN data revealed that the error of the qPCR estimates is lower than the error of the estimated MPN (r=0.982, R(2)=0.965 vs. r=0.967, R(2)=0.935). The growth rates of VTEC strains isolated from milk should be comparatively assessed before qPCR estimates based on the regression model are considered valid. Comparative assessment of the growth rates can be done using spectrophotometric measurements of standardized cultures of isolates and reference strains cultured in BPW at 37°C for 8h. The method developed for the serogroups O157 and O26 can be easily adapted to the other VTEC serogroups that are relevant for human health. The qPCR method is less laborious and faster than the standard MPN method and has been shown to be a good technique for quantifying VTEC in milk. Copyright © 2014 Elsevier B.V. All rights reserved.
Hyperspectral scattering profiles for prediction of the microbial spoilage of beef
NASA Astrophysics Data System (ADS)
Peng, Yankun; Zhang, Jing; Wu, Jianhu; Hang, Hui
2009-05-01
Spoilage in beef is the result of decomposition and the formation of metabolites caused by the growth and enzymatic activity of microorganisms. There is still no technology for the rapid, accurate and non-destructive detection of bacterially spoiled or contaminated beef. In this study, hyperspectral imaging technique was exploited to measure biochemical changes within the fresh beef. Fresh beef rump steaks were purchased from a commercial plant, and left to spoil in refrigerator at 8°C. Every 12 hours, hyperspectral scattering profiles over the spectral region between 400 nm and 1100 nm were collected directly from the sample surface in reflection pattern in order to develop an optimal model for prediction of the beef spoilage, in parallel the total viable count (TVC) per gram of beef were obtained by classical microbiological plating methods. The spectral scattering profiles at individual wavelengths were fitted accurately by a two-parameter Lorentzian distribution function. TVC prediction models were developed, using multi-linear regression, on relating individual Lorentzian parameters and their combinations at different wavelengths to log10(TVC) value. The best predictions were obtained with r2= 0.96 and SEP = 0.23 for log10(TVC). The research demonstrated that hyperspectral imaging technique is a valid tool for real-time and non-destructive detection of bacterial spoilage in beef.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
García-Esquinas, Esther; Pérez-Gómez, Beatriz; Fernández, Mario Antonio; Pérez-Meixeira, Ana María; Gil, Elisa; de Paz, Concha; Iriso, Andrés; Sanz, Juan Carlos; Astray, Jenaro; Cisneros, Margot; de Santos, Amparo; Asensio, Angel; García-Sagredo, José Miguel; García, José Frutos; Vioque, Jesus; Pollán, Marina; López-Abente, Gonzalo; González, Maria José; Martínez, Mercedes; Bohigas, Pedro Arias; Pastor, Roberto; Aragonés, Nuria
2011-09-01
Although breastfeeding is the ideal way of nurturing infants, it can be a source of exposure to toxicants. This study reports the concentration of Hg, Pb and Cd in breast milk from a sample of women drawn from the general population of the Madrid Region, and explores the association between metal levels and socio-demographic factors, lifestyle habits, diet and environmental exposures, including tobacco smoke, exposure at home and occupational exposures. Breast milk was obtained from 100 women (20 mL) at around the third week postpartum. Pb, Cd and Hg levels were determined using Atomic Absorption Spectrometry. Metal levels were log-transformed due to non-normal distribution. Their association with the variables collected by questionnaire was assessed using linear regression models. Separate models were fitted for Hg, Pb and Cd, using univariate linear regression in a first step. Secondly, multivariate linear regression models were adjusted introducing potential confounders specific for each metal. Finally, a test for trend was performed in order to evaluate possible dose-response relationships between metal levels and changes in variables categories. Geometric mean Hg, Pb and Cd content in milk were 0.53 μg L(-1), 15.56 μg L(-1), and 1.31 μg L(-1), respectively. Decreases in Hg levels in older women and in those with a previous history of pregnancies and lactations suggested clearance of this metal over lifetime, though differences were not statistically significant, probably due to limited sample size. Lead concentrations increased with greater exposure to motor vehicle traffic and higher potato consumption. Increased Cd levels were associated with type of lactation and tended to increase with tobacco smoking. Surveillance for the presence of heavy metals in human milk is needed. Smoking and dietary habits are the main factors linked to heavy metal levels in breast milk. Our results reinforce the need to strengthen national food safety programs and to further promote avoidance of unhealthy behaviors such as smoking during pregnancy. Copyright © 2011 Elsevier Ltd. All rights reserved.
Wiley, J.B.; Atkins, John T.; Tasker, Gary D.
2000-01-01
Multiple and simple least-squares regression models for the log10-transformed 100-year discharge with independent variables describing the basin characteristics (log10-transformed and untransformed) for 267 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions of the State, designated East, North, and South. Exploratory data analysis procedures identified 31 gaging stations at which discharges are different than would be expected for West Virginia. Regional equations for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year peak discharges were determined by generalized least-squares regression using data from 236 gaging stations. Log10-transformed drainage area was the most significant independent variable for all regions.Equations developed in this study are applicable only to rural, unregulated, streams within the boundaries of West Virginia. The accuracy of estimating equations is quantified by measuring the average prediction error (from 27.7 to 44.7 percent) and equivalent years of record (from 1.6 to 20.0 years).
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Postmolar gestational trophoblastic neoplasia: beyond the traditional risk factors.
Bakhtiyari, Mahmood; Mirzamoradi, Masoumeh; Kimyaiee, Parichehr; Aghaie, Abbas; Mansournia, Mohammd Ali; Ashrafi-Vand, Sepideh; Sarfjoo, Fatemeh Sadat
2015-09-01
To investigate the slope of linear regression of postevacuation serum hCG as an independent risk factor for postmolar gestational trophoblastic neoplasia (GTN). Multicenter retrospective cohort study. Academic referral health care centers. All subjects with confirmed hydatidiform mole and at least four measurements of β-hCG titer. None. Type and magnitude of the relationship between the slope of linear regression of β-hCG as a new risk factor and GTN using Bayesian logistic regression with penalized log-likelihood estimation. Among the high-risk and low-risk molar pregnancy cases, 11 (18.6%) and 19 cases (13.3%) had GTN, respectively. No significant relationship was found between the components of a high-risk pregnancy and GTN. The β-hCG return slope was higher in the spontaneous cure group. However, the initial level of this hormone in the first measurement was higher in the GTN group compared with in the spontaneous recovery group. The average time for diagnosing GTN in the high-risk molar pregnancy group was 2 weeks less than that of the low-risk molar pregnancy group. In addition to slope of linear regression of β-hCG (odds ratio [OR], 12.74, confidence interval [CI], 5.42-29.2), abortion history (OR, 2.53; 95% CI, 1.27-5.04) and large uterine height for gestational age (OR, 1.26; CI, 1.04-1.54) had the maximum effects on GTN outcome, respectively. The slope of linear regression of β-hCG was introduced as an independent risk factor, which could be used for clinical decision making based on records of β-hCG titer and subsequent prevention program. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
The non-linear association between low-level lead exposure and maternal stress among pregnant women.
Li, Shufang; Xu, Jian; Liu, Zhiwei; Yan, Chong-Huai
2017-03-01
Neuro-developmental impairments in the developing fetus due to exposure to low-level lead have been well documented. However, few studies have investigated the relation between maternal stress levels and low-level lead exposure among pregnant women. To investigate the relation between maternal blood lead and stress levels during index pregnancy. 1931 pregnant women (gestational week 28-36) were investigated using stratified-cluster-sampling in Shanghai in 2010. Maternal life event stress and emotional stress were assessed using "Life-Event-Stress-Scale-for-Pregnant-Women" (LESPW) and "Symptom-Checklist-90-Revised" (SCL-90-R), respectively. Maternal whole blood lead levels were determined, and other data on covariates were obtained from maternal interviews and medical records. Two piecewise linear regression models were applied to assess the relations between blood lead and stress levels using a data-driven approach according to spline smoothing fitting of the data. Maternal blood lead levels ranged from 0.80 to 14.84μg/dL, and the geometric mean was 3.97μg/dL. The P-values for the two piecewise linear models against the single linear regression models were 0.010, 0.003 and 0.017 for models predicting GSI, depression and anxiety symptom scores, respectively. When blood lead levels were below 2.57μg/dL, each unit increase in log10 transformed blood lead levels (μg/dL) was associated with about 18% increase in maternal GSI, depression and anxiety symptom scores (P GSI =0.013, P depression =0.002, P anxiety =0.019, respectively). However, no significant relation was found when blood lead levels were above 2.57μg/dL (all P-values>0.05). Our findings suggested a nonlinear relationship between blood lead and emotional stress levels among pregnant women. Emotional stress increased along with blood lead levels, and appeared to be plateaued when blood lead levels reached 2.57μg/dL. Copyright © 2016 Elsevier B.V. All rights reserved.
Speech Data Analysis for Semantic Indexing of Video of Simulated Medical Crises
2015-05-01
scheduled approximately twice per week and are recorded as video data. During each session, the physician/instructor must manually review and anno - tate...spectrum, y, using regression line: y = ln(1 + Jx), (2.3) where x is the auditory power spectral amplitude, J is a singal-dependent pos- itive constant...The amplitude-warping transform is linear-like for J 1 and logarithmic-like for J 1. 3. RASTA filtering: reintegrate the log critical-band
Oki, Ryo; Ito, Kazuto; Suzuki, Rie; Fujizuka, Yuji; Arai, Seiji; Miyazawa, Yoshiyuki; Sekine, Yoshitaka; Koike, Hidekazu; Matsui, Hiroshi; Shibata, Yasuhiro; Suzuki, Kazuhiro
2018-04-26
Japan has experienced a drastic increase in the incidence of prostate cancer (PC). To assess changes in the risk for PC, we investigated baseline prostate specific antigen (PSA) levels in first-time screened men, across a 25-year period. In total, 72,654 men, aged 50-79, underwent first-time PSA screening in Gunma prefecture between 1992 and 2016. Changes in the distribution of PSA levels were investigated, including the percentage of men with a PSA above cut-off values and linear regression analyses comparing log 10 PSA with age. The 'ultimate incidence' of PC and clinically significant PC (CSPC) were estimated using the PC risk calculator. Changes in the age-standardized incidence rate (AIR) during this period were analyzed. The calculated coefficients of linear regression for age versus log 10 PSA fluctuated during the 25-year period, but no trend was observed. In addition, the percentage of men with a PSA above cut-off values varied in each 5-year period, with no specific trend. The 'risk calculator (RC)-based AIR' of PC and CSPC were stable between 1992 and 2016. Therefore, the baseline risk for developing PC has remained unchanged in the past 25 years, in Japan. The drastic increase in the incidence of PC, beginning around 2000, may be primarily due to increased PSA screening in the country. © 2018 UICC.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Reynolds, Andy M; Reynolds, Don R
2008-01-01
Seminal field studies led by C. G. Johnson in the 1940s and 1950s showed that aphid aerial density diminishes with height above the ground such that the linear regression coefficient, b, of log density on log height provides a single-parameter characterization of the vertical density profile. This coefficient decreases with increasing atmospheric stability, ranging from −0.27 for a fully convective boundary layer to −2.01 for a stable boundary layer. We combined a well-established Lagrangian stochastic model of atmospheric dispersal with simple models of aphid behaviour in order to account for the range of aerial density profiles. We show that these density distributions are consistent with the aphids producing just enough lift to become neutrally buoyant when they are in updraughts and ceasing to produce lift when they are in downdraughts. This active flight behaviour in a weak flier is thus distinctly different from the aerial dispersal of seeds and wingless arthropods, which is passive once these organisms have launched into the air. The novel findings from the model indicate that the epithet ‘passive’ often applied to the windborne migration of small winged insects is misleading and should be abandoned. The implications for the distances traversed by migrating aphids under various boundary-layer conditions are outlined. PMID:18782743
NASA Astrophysics Data System (ADS)
Iturrarán-Viveros, Ursula; Parra, Jorge O.
2014-08-01
Permeability and porosity are two fundamental reservoir properties which relate to the amount of fluid contained in a reservoir and its ability to flow. The intrinsic attenuation is another important parameter since it is related to porosity, permeability, oil and gas saturation and these parameters significantly affect the seismic signature of a reservoir. We apply Artificial Neural Network (ANN) models to predict permeability (k) and porosity (ϕ) for a carbonate aquifer in southeastern Florida and to predict intrinsic attenuation (1/Q) for a sand-shale oil reservoir in northeast Texas. In this study, the Gamma test (a revolutionary estimator of the noise in a data set) has been used as a mathematically non-parametric nonlinear smooth modeling tool to choose the best input combination of seismic attributes to estimate k and ϕ, and the best combination of well-logs to estimate 1/Q. This saves time during the construction and training of ANN models and also sets a lower bound for the mean squared error to prevent over-training. The Neural Network method successfully delineates a highly permeable zone that corresponds to a high water production in the aquifer. The Gamma test found nonlinear relations that were not visible to linear regression allowing us to generalize the ANN estimations of k, ϕ and 1/Q for their respective sets of patterns that were not used during the learning phase.
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Permeability-porosity relationships in sedimentary rocks
Nelson, Philip H.
1994-01-01
In many consolidated sandstone and carbonate formations, plots of core data show that the logarithm of permeability (k) is often linearly proportional to porosity (??). The slope, intercept, and degree of scatter of these log(k)-?? trends vary from formation to formation, and these variations are attributed to differences in initial grain size and sorting, diagenetic history, and compaction history. In unconsolidated sands, better sorting systematically increases both permeability and porosity. In sands and sandstones, an increase in gravel and coarse grain size content causes k to increase even while decreasing ??. Diagenetic minerals in the pore space of sandstones, such as cement and some clay types, tend to decrease log(k) proportionately as ?? decreases. Models to predict permeability from porosity and other measurable rock parameters fall into three classes based on either grain, surface area, or pore dimension considerations. (Models that directly incorporate well log measurements but have no particular theoretical underpinnings from a fourth class.) Grain-based models show permeability proportional to the square of grain size times porosity raised to (roughly) the fifth power, with grain sorting as an additional parameter. Surface-area models show permeability proportional to the inverse square of pore surface area times porosity raised to (roughly) the fourth power; measures of surface area include irreducible water saturation and nuclear magnetic resonance. Pore-dimension models show permeability proportional to the square of a pore dimension times porosity raised to a power of (roughly) two and produce curves of constant pore size that transgress the linear data trends on a log(k)-?? plot. The pore dimension is obtained from mercury injection measurements and is interpreted as the pore opening size of some interconnected fraction of the pore system. The linear log(k)-?? data trends cut the curves of constant pore size from the pore-dimension models, which shows that porosity reduction is always accompanied by a reduction in characteristic pore size. The high powers of porosity of the grain-based and surface-area models are required to compensate for the inclusion of the small end of the pore size spectrum.
Body mass index in relation to serum prostate-specific antigen levels and prostate cancer risk.
Bonn, Stephanie E; Sjölander, Arvid; Tillander, Annika; Wiklund, Fredrik; Grönberg, Henrik; Bälter, Katarina
2016-07-01
High Body mass index (BMI) has been directly associated with risk of aggressive or fatal prostate cancer. One possible explanation may be an effect of BMI on serum levels of prostate-specific antigen (PSA). To study the association between BMI and serum PSA as well as prostate cancer risk, a large cohort of men without prostate cancer at baseline was followed prospectively for prostate cancer diagnoses until 2015. Serum PSA and BMI were assessed among 15,827 men at baseline in 2010-2012. During follow-up, 735 men were diagnosed with prostate cancer with 282 (38.4%) classified as high-grade cancers. Multivariable linear regression models and natural cubic linear regression splines were fitted for analyses of BMI and log-PSA. For risk analysis, Cox proportional hazards regression models were used to estimate hazard ratios (HR) and 95% confidence intervals (CI) and natural cubic Cox regression splines producing standardized cancer-free probabilities were fitted. Results showed that baseline Serum PSA decreased by 1.6% (95% CI: -2.1 to -1.1) with every one unit increase in BMI. Statistically significant decreases of 3.7, 11.7 and 32.3% were seen for increasing BMI-categories of 25 < 30, 30 < 35 and ≥35 kg/m(2), respectively, compared to the reference (18.5 < 25 kg/m(2)). No statistically significant associations were seen between BMI and prostate cancer risk although results were indicative of a positive association to incidence rates of high-grade disease and an inverse association to incidence of low-grade disease. However, findings regarding risk are limited by the short follow-up time. In conclusion, BMI was inversely associated to PSA-levels. BMI should be taken into consideration when referring men to a prostate biopsy based on serum PSA-levels. © 2016 UICC.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
von Eye, Alexander; Mun, Eun Young; Bogat, G Anne
2008-03-01
This article reviews the premises of configural frequency analysis (CFA), including methods of choosing significance tests and base models, as well as protecting alpha, and discusses why CFA is a useful approach when conducting longitudinal person-oriented research. CFA operates at the manifest variable level. Longitudinal CFA seeks to identify those temporal patterns that stand out as more frequent (CFA types) or less frequent (CFA antitypes) than expected with reference to a base model. A base model that has been used frequently in CFA applications, prediction CFA, and a new base model, auto-association CFA, are discussed for analysis of cross-classifications of longitudinal data. The former base model takes the associations among predictors and among criteria into account. The latter takes the auto-associations among repeatedly observed variables into account. Application examples of each are given using data from a longitudinal study of domestic violence. It is demonstrated that CFA results are not redundant with results from log-linear modeling or multinomial regression and that, of these approaches, CFA shows particular utility when conducting person-oriented research.
Wu, Chih Cheng; Lee, Grace W M; Yang, Shinhao; Yu, Kuo-Pin; Lou, Chia Ling
2006-10-15
Although negative air ionizer is commonly used for indoor air cleaning, few studies examine the concentration gradient of negative air ion (NAI) in indoor environments. This study investigated the concentration gradient of NAI at various relative humidities and distances form the source in indoor air. The NAI was generated by single-electrode negative electric discharge; the discharge was kept at dark discharge and 30.0 kV. The NAI concentrations were measured at various distances (10-900 cm) from the discharge electrode in order to identify the distribution of NAI in an indoor environment. The profile of NAI concentration was monitored at different relative humidities (38.1-73.6% RH) and room temperatures (25.2+/-1.4 degrees C). Experimental results indicate that the influence of relative humidity on the concentration gradient of NAI was complicated. There were four trends for the relationship between NAI concentration and relative humidity at different distances from the discharge electrode. The changes of NAI concentration with an increase in relative humidity at different distances were quite steady (10-30 cm), strongly declining (70-360 cm), approaching stability (420-450 cm) and moderately increasing (560-900 cm). Additionally, the regression analysis of NAI concentrations and distances from the discharge electrode indicated a logarithmic linear (log-linear) relationship; the distance of log-linear tendency (lambda) decreased with an increase in relative humidity such that the log-linear distance of 38.1% RH was 2.9 times that of 73.6% RH. Moreover, an empirical curve fit based on this study for the concentration gradient of NAI generated by negative electric discharge in indoor air was developed for estimating the NAI concentration at different relative humidities and distances from the source of electric discharge.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Hyun, Seunghun; Kim, Minhee; Baek, Kitae; Lee, Linda S
2010-01-01
The effect of the sorption of phenanthrene and 2,2',5,5'-polychlorinated biphenyl (PCB52) by five differently weathered soils were measured in water and low methanol volume fraction (f(c)0.5) as a function of the apparent solution pH (pH(app)). Two weathered oxisols (A2 and DRC), and moderately weathered alfisols (Toronto) and two young soils (K5 and Webster) were used. The K(m) (linear sorption coefficient) values, which log-linearly decreases with f(c), were interpreted using a cosolvency sorption model. For phenanthrene sorption at the natural pH, the empirical constant (alpha) ranged between 0.95 and 1.14, and was in the order of oxisols (A2 and DRC)
Tsuji, Leonard J S; Wainman, Bruce C; Martin, Ian D; Weber, Jean-Philippe; Sutherland, Celine; Elliott, J Richard; Nieboer, Evert
2005-09-01
Abandoned radar line stations in the North American arctic and sub-arctic regions are point sources of contamination, especially for PCBs. Few data exist with respect to human body burden of organochlorines (OCs) in residents of communities located in close proximity to these radar line sites. We compared plasma OC concentration (unadjusted for total lipids) frequency distribution data using log-linear contingency modelling for Fort Albany First Nation, the site of an abandoned Mid-Canada Radar Line station, and two comparison populations (the neighbouring community of Kashechewan First Nation without such a radar installation, and Hamilton, a city in southern Ontario, Canada). This type of analysis is important as it allows for an initial investigation of contaminant data without imputing any values. The two-state log-linear model (employing both non-detectable and detectable concentration frequencies and applicable to PCB congeners 28 and 105 and cis-nonachlor) and the four-state log-linear model (using quartile concentration frequencies for Aroclor 1260, PCB congeners [99,118,138,153,156,170,180,183,187], beta-HCH, p,p'-DDT +p,p'-DDE, HCB, mirex, oxychlordane, and trans-nonachlor) revealed that the effects of subject gender were inconsequential. Significant differences (p < 0.05) between the groups examined were attributable to the effect of location on the frequency of detection of OCs or on their differential distribution among the concentration quartiles. In general, people from Hamilton had higher frequencies of non-detections and of concentrations in the first quartile (p < 0.05) for most OCs compared to people from Fort Albany and Kashechewan (who consume a traditional diet of wild meats that does not include marine mammals). An unexpected finding was that, for Kashechewan males, the frequency of many OCs was significantly higher (p < 0.05) in the 4th concentration quartile than that predicted by the four-state log-linear model, but significantly lower than expected in the 1st quartile for beta-HCH. The levels of PCBs found for women in Fort Albany and Kashechewan were greater than those reported for Dene (First Nation people) and Métis (mixed heritage) of the western Northwest Territories (NWT) who did not consume marine mammals, and for Inuit living in the central NWT (occasional consumers of marine mammals). Moreover, the levels of total p,p'-DDT were greater for Fort Albany and Kashechewan women compared to these same aboriginal groups.
Log-Linear Models for Gene Association
Hu, Jianhua; Joshi, Adarsh; Johnson, Valen E.
2009-01-01
We describe a class of log-linear models for the detection of interactions in high-dimensional genomic data. This class of models leads to a Bayesian model selection algorithm that can be applied to data that have been reduced to contingency tables using ranks of observations within subjects, and discretization of these ranks within gene/network components. Many normalization issues associated with the analysis of genomic data are thereby avoided. A prior density based on Ewens’ sampling distribution is used to restrict the number of interacting components assigned high posterior probability, and the calculation of posterior model probabilities is expedited by approximations based on the likelihood ratio statistic. Simulation studies are used to evaluate the efficiency of the resulting algorithm for known interaction structures. Finally, the algorithm is validated in a microarray study for which it was possible to obtain biological confirmation of detected interactions. PMID:19655032
Development of a pharmacogenetic-guided warfarin dosing algorithm for Puerto Rican patients
Ramos, Alga S; Seip, Richard L; Rivera-Miranda, Giselle; Felici-Giovanini, Marcos E; Garcia-Berdecia, Rafael; Alejandro-Cowan, Yirelia; Kocherla, Mohan; Cruz, Iadelisse; Feliu, Juan F; Cadilla, Carmen L; Renta, Jessica Y; Gorowski, Krystyna; Vergara, Cunegundo; Ruaño, Gualberto; Duconge, Jorge
2012-01-01
Aim This study was aimed at developing a pharmacogenetic-driven warfarin-dosing algorithm in 163 admixed Puerto Rican patients on stable warfarin therapy. Patients & methods A multiple linear-regression analysis was performed using log-transformed effective warfarin dose as the dependent variable, and combining CYP2C9 and VKORC1 genotyping with other relevant nongenetic clinical and demographic factors as independent predictors. Results The model explained more than two-thirds of the observed variance in the warfarin dose among Puerto Ricans, and also produced significantly better ‘ideal dose’ estimates than two pharmacogenetic models and clinical algorithms published previously, with the greatest benefit seen in patients ultimately requiring <7 mg/day. We also assessed the clinical validity of the model using an independent validation cohort of 55 Puerto Rican patients from Hartford, CT, USA (R2 = 51%). Conclusion Our findings provide the basis for planning prospective pharmacogenetic studies to demonstrate the clinical utility of genotyping warfarin-treated Puerto Rican patients. PMID:23215886
Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei
2014-01-01
The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Baysal, Ayse Handan; Molva, Celenk; Unluturk, Sevcan
2013-09-16
In the present study, the effect of short wave ultraviolet light (UV-C) on the inactivation of Alicyclobacillus acidoterrestris DSM 3922 spores in commercial pasteurized white grape and apple juices was investigated. The inactivation of A. acidoterrestris spores in juices was examined by evaluating the effects of UV light intensity (1.31, 0.71 and 0.38 mW/cm²) and exposure time (0, 3, 5, 7, 10, 12 and 15 min) at constant depth (0.15 cm). The best reduction (5.5-log) was achieved in grape juice when the UV intensity was 1.31 mW/cm². The maximum inactivation was approximately 2-log CFU/mL in apple juice under the same conditions. The results showed that first-order kinetics were not suitable for the estimation of spore inactivation in grape juice treated with UV-light. Since tailing was observed in the survival curves, the log-linear plus tail and Weibull models were compared. The results showed that the log-linear plus tail model was satisfactorily fitted to estimate the reductions. As a non-thermal technology, UV-C treatment could be an alternative to thermal treatment for grape juices or combined with other preservation methods for the pasteurization of apple juice. © 2013 Elsevier B.V. All rights reserved.
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Mahara, Gehendra; Wang, Chao; Yang, Kun; Chen, Sipeng; Guo, Jin; Gao, Qi; Wang, Wei; Wang, Quanyi; Guo, Xiuhua
2016-01-01
(1) Background: Evidence regarding scarlet fever and its relationship with meteorological, including air pollution factors, is not very available. This study aimed to examine the relationship between ambient air pollutants and meteorological factors with scarlet fever occurrence in Beijing, China. (2) Methods: A retrospective ecological study was carried out to distinguish the epidemic characteristics of scarlet fever incidence in Beijing districts from 2013 to 2014. Daily incidence and corresponding air pollutant and meteorological data were used to develop the model. Global Moran’s I statistic and Anselin’s local Moran’s I (LISA) were applied to detect the spatial autocorrelation (spatial dependency) and clusters of scarlet fever incidence. The spatial lag model (SLM) and spatial error model (SEM) including ordinary least squares (OLS) models were then applied to probe the association between scarlet fever incidence and meteorological including air pollution factors. (3) Results: Among the 5491 cases, more than half (62%) were male, and more than one-third (37.8%) were female, with the annual average incidence rate 14.64 per 100,000 population. Spatial autocorrelation analysis exhibited the existence of spatial dependence; therefore, we applied spatial regression models. After comparing the values of R-square, log-likelihood and the Akaike information criterion (AIC) among the three models, the OLS model (R2 = 0.0741, log likelihood = −1819.69, AIC = 3665.38), SLM (R2 = 0.0786, log likelihood = −1819.04, AIC = 3665.08) and SEM (R2 = 0.0743, log likelihood = −1819.67, AIC = 3665.36), identified that the spatial lag model (SLM) was best for model fit for the regression model. There was a positive significant association between nitrogen oxide (p = 0.027), rainfall (p = 0.036) and sunshine hour (p = 0.048), while the relative humidity (p = 0.034) had an adverse association with scarlet fever incidence in SLM. (4) Conclusions: Our findings indicated that meteorological, as well as air pollutant factors may increase the incidence of scarlet fever; these findings may help to guide scarlet fever control programs and targeting the intervention. PMID:27827946
Mahara, Gehendra; Wang, Chao; Yang, Kun; Chen, Sipeng; Guo, Jin; Gao, Qi; Wang, Wei; Wang, Quanyi; Guo, Xiuhua
2016-11-04
(1) Background: Evidence regarding scarlet fever and its relationship with meteorological, including air pollution factors, is not very available. This study aimed to examine the relationship between ambient air pollutants and meteorological factors with scarlet fever occurrence in Beijing, China. (2) Methods: A retrospective ecological study was carried out to distinguish the epidemic characteristics of scarlet fever incidence in Beijing districts from 2013 to 2014. Daily incidence and corresponding air pollutant and meteorological data were used to develop the model. Global Moran's I statistic and Anselin's local Moran's I (LISA) were applied to detect the spatial autocorrelation (spatial dependency) and clusters of scarlet fever incidence. The spatial lag model (SLM) and spatial error model (SEM) including ordinary least squares (OLS) models were then applied to probe the association between scarlet fever incidence and meteorological including air pollution factors. (3) Results: Among the 5491 cases, more than half (62%) were male, and more than one-third (37.8%) were female, with the annual average incidence rate 14.64 per 100,000 population. Spatial autocorrelation analysis exhibited the existence of spatial dependence; therefore, we applied spatial regression models. After comparing the values of R-square, log-likelihood and the Akaike information criterion (AIC) among the three models, the OLS model (R² = 0.0741, log likelihood = -1819.69, AIC = 3665.38), SLM (R² = 0.0786, log likelihood = -1819.04, AIC = 3665.08) and SEM (R² = 0.0743, log likelihood = -1819.67, AIC = 3665.36), identified that the spatial lag model (SLM) was best for model fit for the regression model. There was a positive significant association between nitrogen oxide ( p = 0.027), rainfall ( p = 0.036) and sunshine hour ( p = 0.048), while the relative humidity ( p = 0.034) had an adverse association with scarlet fever incidence in SLM. (4) Conclusions: Our findings indicated that meteorological, as well as air pollutant factors may increase the incidence of scarlet fever; these findings may help to guide scarlet fever control programs and targeting the intervention.
Duan, Zhi; Hansen, Terese Holst; Hansen, Tina Beck; Dalgaard, Paw; Knøchel, Susanne
2016-08-02
With low temperature long time (LTLT) cooking it can take hours for meat to reach a final core temperature above 53°C and germination followed by growth of Clostridium perfringens is a concern. Available and new growth data in meats including 154 lag times (tlag), 224 maximum specific growth rates (μmax) and 25 maximum population densities (Nmax) were used to developed a model to predict growth of C. perfringens during the coming-up time of LTLT cooking. New data were generate in 26 challenge tests with chicken (pH6.8) and pork (pH5.6) at two different slowly increasing temperature (SIT) profiles (10°C to 53°C) followed by 53°C in up to 30h in total. Three inoculum types were studied including vegetative cells, non-heated spores and heat activated (75°C, 20min) spores of C. perfringens strain 790-94. Concentrations of vegetative cells in chicken increased 2 to 3logCFU/g during the SIT profiles. Similar results were found for non-heated and heated spores in chicken, whereas in pork C. perfringens 790-94 increased less than 1logCFU/g. At 53°C C. perfringens 790-94 was log-linearly inactivated. Observed and predicted concentrations of C. perfringens, at the time when 53°C (log(N53)) was reached, were used to evaluate the new growth model and three available predictive models previously published for C. perfringens growth during cooling rather than during SIT profiles. Model performance was evaluated by using mean deviation (MD), mean absolute deviation (MAD) and the acceptable simulation zone (ASZ) approach with a zone of ±0.5logCFU/g. The new model showed best performance with MD=0.27logCFU/g, MAD=0.66logCFU/g and ASZ=67%. The two growth models that performed best, were used together with a log-linear inactivation model and D53-values from the present study to simulate the behaviour of C. perfringens under the fast and slow SIT profiles investigated in the present study. Observed and predicted concentrations were compared using a new fail-safe acceptable zone (FSAZ) method. FSAZ was defined as the predicted concentration of C. perfringens plus 0.5logCFU/g. If at least 85% of the observed log-counts were below the FSAZ, the model was considered fail-safe. The two models showed similar performance but none of them performed satisfactorily for all conditions. It is recommended to use the models without a lag phase until more precise lag time models become available. Copyright © 2016 Elsevier B.V. All rights reserved.
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Porcaro, Antonio B; Ghimenton, Claudio; Petrozziello, Aldo; Sava, Teodoro; Migliorini, Filippo; Romano, Mario; Caruso, Beatrice; Cocco, Claudio; Antoniolli, Stefano Zecchinini; Lacola, Vincenzo; Rubilotta, Emanuele; Monaco, Carmelo
2012-10-01
To evaluate estradiol (E(2)) physiopathology along the pituitary-testicular-prostate axis at the time of initial diagnosis of prostate cancer (PC) and subsequent cluster selection of the patient population. Records of the diagnosed (n=105) and operated (n=91) patients were retrospectively reviewed. Age, percentage of positive cores at-biopsy (P+), biopsy Gleason score (bGS), E(2), prolactin (PRL), luteinizing hormone (LH), follicle-stimulating hormone (FSH), total testosterone (TT), free-testosterone (FT), prostate-specific antigen (PSA), pathology Gleason score (pGS), estimated tumor volume in relation to percentage of prostate volume (V+), overall prostate weight (Wi), clinical stage (cT), biopsy Gleason pattern (bGP) and pathology stage (pT), were the investigated variables. None of the patients had previously undergone hormonal manipulations. E(2) correlation and prediction by multiple linear regression analysis (MLRA) was performed. At diagnosis, the log E(2)/log bGS ratio clustered the population into groups A (log E(2)/log bGS ≤ 2.25), B (2.25
DOE Office of Scientific and Technical Information (OSTI.GOV)
Otake, M.; Schull, W.J.
This paper investigates the quantitative relationship of ionizing radiation to the occurrence of posterior lenticular opacities among the survivors of the atomic bombings of Hiroshima and Nagasaki suggested by the DS86 dosimetry system. DS86 doses are available for 1983 (93.4%) of the 2124 atomic bomb survivors analyzed in 1982. The DS86 kerma neutron component for Hiroshima survivors is much smaller than its comparable T65DR component, but still 4.2-fold higher (0.38 Gy at 6 Gy) than that in Nagasaki (0.09 Gy at 6 Gy). Thus, if the eye is especially sensitive to neutrons, there may yet be some useful information onmore » their effects, particularly in Hiroshima. The dose-response relationship has been evaluated as a function of the separately estimated gamma-ray and neutron doses. Among several different dose-response models without and with two thresholds, we have selected as the best model the one with the smallest x2 or the largest log likelihood value associated with the goodness of fit. The best fit is a linear gamma-linear neutron relationship which assumes different thresholds for the two types of radiation. Both gamma and neutron regression coefficients for the best fitting model are positive and highly significant for the estimated DS86 eye organ dose.« less
LogCauchy, log-sech and lognormal distributions of species abundances in forest communities
Yin, Z.-Y.; Peng, S.-L.; Ren, H.; Guo, Q.; Chen, Z.-H.
2005-01-01
Species-abundance (SA) pattern is one of the most fundamental aspects of biological community structure, providing important information regarding species richness, species-area relation and succession. To better describe the SA distribution (SAD) in a community, based on the widely used lognormal (LN) distribution model with exp(-x2) roll-off on Preston's octave scale, this study proposed two additional models, logCauchy (LC) and log-sech (LS), respectively with roll-offs of simple x-2 and e-x. The estimation of the theoretical total number of species in the whole community, S*, including very rare species not yet collected in sample, was derived from the left-truncation of each distribution. We fitted these three models by Levenberg-Marquardt nonlinear regression and measured the model fit to the data using coefficient of determination of regression, parameters' t-test and distribution's Kolmogorov-Smirnov (KS) test. Examining the SA data from six forest communities (five in lower subtropics and one in tropics), we found that: (1) on a log scale, all three models that are bell-shaped and left-truncated statistically adequately fitted the observed SADs, and the LC and LS did better than the LN; (2) from each model and for each community the S* values estimated by the integral and summation methods were almost equal, allowing us to estimate S* using a simple integral formula and to estimate its asymptotic confidence internals by regression of a transformed model containing it; (3) following the order of LC, LS, and LN, the fitted distributions became lower in the peak, less concave in the side, and shorter in the tail, and overall the LC tended to overestimate, the LN tended to underestimate, while the LS was intermediate but slightly tended to underestimate, the observed SADs (particularly the number of common species in the right tail); (4) the six communities had some similar structural properties such as following similar distribution models, having a common modal octave and a similar proportion of common species. We suggested that what follows the LN distribution should follow (or better follow) the LC and LS, and that the LC, LS and LN distributions represent a "sequential distribution set" in which one can find a best fit to the observed SAD. ?? 2004 Elsevier B.V. All rights reserved.
Gupta, Deepak K; Claggett, Brian; Wells, Quinn; Cheng, Susan; Li, Man; Maruthur, Nisa; Selvin, Elizabeth; Coresh, Josef; Konety, Suma; Butler, Kenneth R; Mosley, Thomas; Boerwinkle, Eric; Hoogeveen, Ron; Ballantyne, Christie M; Solomon, Scott D
2015-05-21
Natriuretic peptides promote natriuresis, diuresis, and vasodilation. Experimental deficiency of natriuretic peptides leads to hypertension (HTN) and cardiac hypertrophy, conditions more common among African Americans. Hospital-based studies suggest that African Americans may have reduced circulating natriuretic peptides, as compared to Caucasians, but definitive data from community-based cohorts are lacking. We examined plasma N-terminal pro B-type natriuretic peptide (NTproBNP) levels according to race in 9137 Atherosclerosis Risk in Communities (ARIC) Study participants (22% African American) without prevalent cardiovascular disease at visit 4 (1996-1998). Multivariable linear and logistic regression analyses were performed adjusting for clinical covariates. Among African Americans, percent European ancestry was determined from genetic ancestry informative markers and then examined in relation to NTproBNP levels in multivariable linear regression analysis. NTproBNP levels were significantly lower in African Americans (median, 43 pg/mL; interquartile range [IQR], 18, 88) than Caucasians (median, 68 pg/mL; IQR, 36, 124; P<0.0001). In multivariable models, adjusted log NTproBNP levels were 40% lower (95% confidence interval [CI], -43, -36) in African Americans, compared to Caucasians, which was consistent across subgroups of age, gender, HTN, diabetes, insulin resistance, and obesity. African-American race was also significantly associated with having nondetectable NTproBNP (adjusted OR, 5.74; 95% CI, 4.22, 7.80). In multivariable analyses in African Americans, a 10% increase in genetic European ancestry was associated with a 7% (95% CI, 1, 13) increase in adjusted log NTproBNP. African Americans have lower levels of plasma NTproBNP than Caucasians, which may be partially owing to genetic variation. Low natriuretic peptide levels in African Americans may contribute to the greater risk for HTN and its sequalae in this population. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Generating log-normal mock catalog of galaxies in redshift space
NASA Astrophysics Data System (ADS)
Agrawal, Aniket; Makiya, Ryu; Chiang, Chi-Ting; Jeong, Donghui; Saito, Shun; Komatsu, Eiichiro
2017-10-01
We present a public code to generate a mock galaxy catalog in redshift space assuming a log-normal probability density function (PDF) of galaxy and matter density fields. We draw galaxies by Poisson-sampling the log-normal field, and calculate the velocity field from the linearised continuity equation of matter fields, assuming zero vorticity. This procedure yields a PDF of the pairwise velocity fields that is qualitatively similar to that of N-body simulations. We check fidelity of the catalog, showing that the measured two-point correlation function and power spectrum in real space agree with the input precisely. We find that a linear bias relation in the power spectrum does not guarantee a linear bias relation in the density contrasts, leading to a cross-correlation coefficient of matter and galaxies deviating from unity on small scales. We also find that linearising the Jacobian of the real-to-redshift space mapping provides a poor model for the two-point statistics in redshift space. That is, non-linear redshift-space distortion is dominated by non-linearity in the Jacobian. The power spectrum in redshift space shows a damping on small scales that is qualitatively similar to that of the well-known Fingers-of-God (FoG) effect due to random velocities, except that the log-normal mock does not include random velocities. This damping is a consequence of non-linearity in the Jacobian, and thus attributing the damping of the power spectrum solely to FoG, as commonly done in the literature, is misleading.
Analysing the Costs of Integrated Care: A Case on Model Selection for Chronic Care Purposes
Sánchez-Pérez, Inma; Ibern, Pere; Coderch, Jordi; Inoriza, José María
2016-01-01
Background: The objective of this study is to investigate whether the algorithm proposed by Manning and Mullahy, a consolidated health economics procedure, can also be used to estimate individual costs for different groups of healthcare services in the context of integrated care. Methods: A cross-sectional study focused on the population of the Baix Empordà (Catalonia-Spain) for the year 2012 (N = 92,498 individuals). A set of individual cost models as a function of sex, age and morbidity burden were adjusted and individual healthcare costs were calculated using a retrospective full-costing system. The individual morbidity burden was inferred using the Clinical Risk Groups (CRG) patient classification system. Results: Depending on the characteristics of the data, and according to the algorithm criteria, the choice of model was a linear model on the log of costs or a generalized linear model with a log link. We checked for goodness of fit, accuracy, linear structure and heteroscedasticity for the models obtained. Conclusion: The proposed algorithm identified a set of suitable cost models for the distinct groups of services integrated care entails. The individual morbidity burden was found to be indispensable when allocating appropriate resources to targeted individuals. PMID:28316542
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Singh, Minerva; Evans, Damian; Coomes, David A.; Friess, Daniel A.; Suy Tan, Boun; Samean Nin, Chan
2016-01-01
This research examines the role of canopy cover in influencing above ground biomass (AGB) dynamics of an open canopied forest and evaluates the efficacy of individual-based and plot-scale height metrics in predicting AGB variation in the tropical forests of Angkor Thom, Cambodia. The AGB was modeled by including canopy cover from aerial imagery alongside with the two different canopy vertical height metrics derived from LiDAR; the plot average of maximum tree height (Max_CH) of individual trees, and the top of the canopy height (TCH). Two different statistical approaches, log-log ordinary least squares (OLS) and support vector regression (SVR), were used to model AGB variation in the study area. Ten different AGB models were developed using different combinations of airborne predictor variables. It was discovered that the inclusion of canopy cover estimates considerably improved the performance of AGB models for our study area. The most robust model was log-log OLS model comprising of canopy cover only (r = 0.87; RMSE = 42.8 Mg/ha). Other models that approximated field AGB closely included both Max_CH and canopy cover (r = 0.86, RMSE = 44.2 Mg/ha for SVR; and, r = 0.84, RMSE = 47.7 Mg/ha for log-log OLS). Hence, canopy cover should be included when modeling the AGB of open-canopied tropical forests. PMID:27176218
Singh, Minerva; Evans, Damian; Coomes, David A; Friess, Daniel A; Suy Tan, Boun; Samean Nin, Chan
2016-01-01
This research examines the role of canopy cover in influencing above ground biomass (AGB) dynamics of an open canopied forest and evaluates the efficacy of individual-based and plot-scale height metrics in predicting AGB variation in the tropical forests of Angkor Thom, Cambodia. The AGB was modeled by including canopy cover from aerial imagery alongside with the two different canopy vertical height metrics derived from LiDAR; the plot average of maximum tree height (Max_CH) of individual trees, and the top of the canopy height (TCH). Two different statistical approaches, log-log ordinary least squares (OLS) and support vector regression (SVR), were used to model AGB variation in the study area. Ten different AGB models were developed using different combinations of airborne predictor variables. It was discovered that the inclusion of canopy cover estimates considerably improved the performance of AGB models for our study area. The most robust model was log-log OLS model comprising of canopy cover only (r = 0.87; RMSE = 42.8 Mg/ha). Other models that approximated field AGB closely included both Max_CH and canopy cover (r = 0.86, RMSE = 44.2 Mg/ha for SVR; and, r = 0.84, RMSE = 47.7 Mg/ha for log-log OLS). Hence, canopy cover should be included when modeling the AGB of open-canopied tropical forests.
Regional variability among nonlinear chlorophyll-phosphorus relationships in lakes
Filstrup, Christopher T.; Wagner, Tyler; Soranno, Patricia A.; Stanley, Emily H.; Stow, Craig A.; Webster, Katherine E.; Downing, John A.
2014-01-01
The relationship between chlorophyll a (Chl a) and total phosphorus (TP) is a fundamental relationship in lakes that reflects multiple aspects of ecosystem function and is also used in the regulation and management of inland waters. The exact form of this relationship has substantial implications on its meaning and its use. We assembled a spatially extensive data set to examine whether nonlinear models are a better fit for Chl a—TP relationships than traditional log-linear models, whether there were regional differences in the form of the relationships, and, if so, which regional factors were related to these differences. We analyzed a data set from 2105 temperate lakes across 35 ecoregions by fitting and comparing two different nonlinear models and one log-linear model. The two nonlinear models fit the data better than the log-linear model. In addition, the parameters for the best-fitting model varied among regions: the maximum and lower Chl aasymptotes were positively and negatively related to percent regional pasture land use, respectively, and the rate at which chlorophyll increased with TP was negatively related to percent regional wetland cover. Lakes in regions with more pasture fields had higher maximum chlorophyll concentrations at high TP concentrations but lower minimum chlorophyll concentrations at low TP concentrations. Lakes in regions with less wetland cover showed a steeper Chl a—TP relationship than wetland-rich regions. Interpretation of Chl a—TP relationships depends on regional differences, and theory and management based on a monolithic relationship may be inaccurate.
Statistical method to compare massive parallel sequencing pipelines.
Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P
2017-03-01
Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Response Strength in Extreme Multiple Schedules
McLean, Anthony P; Grace, Randolph C; Nevin, John A
2012-01-01
Four pigeons were trained in a series of two-component multiple schedules. Reinforcers were scheduled with random-interval schedules. The ratio of arranged reinforcer rates in the two components was varied over 4 log units, a much wider range than previously studied. When performance appeared stable, prefeeding tests were conducted to assess resistance to change. Contrary to the generalized matching law, logarithms of response ratios in the two components were not a linear function of log reinforcer ratios, implying a failure of parameter invariance. Over a 2 log unit range, the function appeared linear and indicated undermatching, but in conditions with more extreme reinforcer ratios, approximate matching was observed. A model suggested by McLean (1991), originally for local contrast, predicts these changes in sensitivity to reinforcer ratios somewhat better than models by Herrnstein (1970) and by Williams and Wixted (1986). Prefeeding tests of resistance to change were conducted at each reinforcer ratio, and relative resistance to change was also a nonlinear function of log reinforcer ratios, again contrary to conclusions from previous work. Instead, the function suggests that resistance to change in a component may be determined partly by the rate of reinforcement and partly by the ratio of reinforcers to responses. PMID:22287804
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Bishai, David; Opuni, Marjorie
2009-01-01
Background Time trends in infant mortality for the 20th century show a curvilinear pattern that most demographers have assumed to be approximately exponential. Virtually all cross-country comparisons and time series analyses of infant mortality have studied the logarithm of infant mortality to account for the curvilinear time trend. However, there is no evidence that the log transform is the best fit for infant mortality time trends. Methods We use maximum likelihood methods to determine the best transformation to fit time trends in infant mortality reduction in the 20th century and to assess the importance of the proper transformation in identifying the relationship between infant mortality and gross domestic product (GDP) per capita. We apply the Box Cox transform to infant mortality rate (IMR) time series from 18 countries to identify the best fitting value of lambda for each country and for the pooled sample. For each country, we test the value of λ against the null that λ = 0 (logarithmic model) and against the null that λ = 1 (linear model). We then demonstrate the importance of selecting the proper transformation by comparing regressions of ln(IMR) on same year GDP per capita against Box Cox transformed models. Results Based on chi-squared test statistics, infant mortality decline is best described as an exponential decline only for the United States. For the remaining 17 countries we study, IMR decline is neither best modelled as logarithmic nor as a linear process. Imposing a logarithmic transform on IMR can lead to bias in fitting the relationship between IMR and GDP per capita. Conclusion The assumption that IMR declines are exponential is enshrined in the Preston curve and in nearly all cross-country as well as time series analyses of IMR data since Preston's 1975 paper, but this assumption is seldom correct. Statistical analyses of IMR trends should assess the robustness of findings to transformations other than the log transform. PMID:19698144
Stochastic and deterministic model of microbial heat inactivation.
Corradini, Maria G; Normand, Mark D; Peleg, Micha
2010-03-01
Microbial inactivation is described by a model based on the changing survival probabilities of individual cells or spores. It is presented in a stochastic and discrete form for small groups, and as a continuous deterministic model for larger populations. If the underlying mortality probability function remains constant throughout the treatment, the model generates first-order ("log-linear") inactivation kinetics. Otherwise, it produces survival patterns that include Weibullian ("power-law") with upward or downward concavity, tailing with a residual survival level, complete elimination, flat "shoulder" with linear or curvilinear continuation, and sigmoid curves. In both forms, the same algorithm or model equation applies to isothermal and dynamic heat treatments alike. Constructing the model does not require assuming a kinetic order or knowledge of the inactivation mechanism. The general features of its underlying mortality probability function can be deduced from the experimental survival curve's shape. Once identified, the function's coefficients, the survival parameters, can be estimated directly from the experimental survival ratios by regression. The model is testable in principle but matching the estimated mortality or inactivation probabilities with those of the actual cells or spores can be a technical challenge. The model is not intended to replace current models to calculate sterility. Its main value, apart from connecting the various inactivation patterns to underlying probabilities at the cellular level, might be in simulating the irregular survival patterns of small groups of cells and spores. In principle, it can also be used for nonthermal methods of microbial inactivation and their combination with heat.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Two innovative pore pressure calculation methods for shallow deep-water formations
NASA Astrophysics Data System (ADS)
Deng, Song; Fan, Honghai; Liu, Yuhan; He, Yanfeng; Zhang, Shifeng; Yang, Jing; Fu, Lipei
2017-11-01
There are many geological hazards in shallow formations associated with oil and gas exploration and development in deep-water settings. Abnormal pore pressure can lead to water flow and gas and gas hydrate accumulations, which may affect drilling safety. Therefore, it is of great importance to accurately predict pore pressure in shallow deep-water formations. Experience over previous decades has shown, however, that there are not appropriate pressure calculation methods for these shallow formations. Pore pressure change is reflected closely in log data, particularly for mudstone formations. In this paper, pore pressure calculations for shallow formations are highlighted, and two concrete methods using log data are presented. The first method is modified from an E. Philips test in which a linear-exponential overburden pressure model is used. The second method is a new pore pressure method based on P-wave velocity that accounts for the effect of shallow gas and shallow water flow. Afterwards, the two methods are validated using case studies from two wells in the Yingqiong basin. Calculated results are compared with those obtained by the Eaton method, which demonstrates that the multi-regression method is more suitable for quick prediction of geological hazards in shallow layers.
Kapke, G E; Watson, G; Sheffler, S; Hunt, D; Frederick, C
1997-01-01
Several assays for quantification of DNA have been developed and are currently used in research and clinical laboratories. However, comparison of assay results has been difficult owing to the use of different standards and units of measurements as well as differences between assays in dynamic range and quantification limits. Although a few studies have compared results generated by different assays, there has been no consensus on conversion factors and thorough analysis has been precluded by small sample size and limited dynamic range studied. In this study, we have compared the Chiron branched DNA (bDNA) and Abbott liquid hybridization assays for quantification of hepatitis B virus (HBV) DNA in clinical specimens and have derived conversion factors to facilitate comparison of assay results. Additivity and variance stabilizing (AVAS) regression, a form of non-linear regression analysis, was performed on assay results for specimens from HBV clinical trials. Our results show that there is a strong linear relationship (R2 = 0.96) between log Chiron and log Abbott assay results. Conversion factors derived from regression analyses were found to be non-constant and ranged from 6-40. Analysis of paired assay results below and above each assay's limit of quantification (LOQ) indicated that a significantly (P < 0.01) larger proportion of observations were below the Abbott assay LOQ but above the Chiron assay LOQ, indicating that the Chiron assay is significantly more sensitive than the Abbott assay. Testing of replicate specimens showed that the Chiron assay consistently yielded lower per cent coefficients of variance (% CVs) than the Abbott assay, indicating that the Chiron assay provides superior precision.
A decline in the prevalence of injecting drug users in Estonia, 2005–2009
Uusküla, A; Rajaleid, K; Talu, A; Abel-Ollo, K; Des Jarlais, DC
2013-01-01
Aims and setting Descriptions of behavioural epidemics have received little attention compared with infectious disease epidemics in Eastern Europe. Here we report a study aimed at estimating trends in the prevalence of injection drug use between 2005 and 2009 in Estonia. Design and methods The number of injection drug users (IDUs) aged 15–44 each year between 2005 and 2009 was estimated using capture-recapture methodology based on 4 data sources (2 treatment data bases: drug abuse and non-fatal overdose treatment; criminal justice (drug related offences) and mortality (injection drug use related deaths) data). Poisson log-linear regression models were applied to the matched data, with interactions between data sources fitted to replicate the dependencies between the data sources. Linear regression was used to estimate average change over time. Findings there were 24305, 12292, 238, 545 records and 8100, 1655, 155, 545 individual IDUs identified in the four capture sources (Police, drug treatment, overdose, and death registry, accordingly) over the period 2005 – 2009. The estimated prevalence of IDUs among the population aged 15–44 declined from 2.7% (1.8–7.9%) in 2005 to 2.0% (1.4–5.0%) in 2008, and 0.9% (0.7–1.7%) in 2009. Regression analysis indicated an average reduction of over 1700 injectors per year. Conclusion While the capture-recapture method has known limitations, the results are consistent with other data from Estonia. Identifying the drivers of change in the prevalence of injection drug use warrants further research. PMID:23290632
Circulating fibrinogen but not D-dimer level is associated with vital exhaustion in school teachers.
Kudielka, Brigitte M; Bellingrath, Silja; von Känel, Roland
2008-07-01
Meta-analyses have established elevated fibrinogen and D-dimer levels in the circulation as biological risk factors for the development and progression of coronary artery disease (CAD). Here, we investigated whether vital exhaustion (VE), a known psychosocial risk factor for CAD, is associated with fibrinogen and D-dimer levels in a sample of apparently healthy school teachers. The teaching profession has been proposed as a potentially high stressful occupation due to enhanced psychosocial stress at the workplace. Plasma fibrinogen and D-dimer levels were measured in 150 middle-aged male and female teachers derived from the first year of the Trier-Teacher-Stress-Study. Log-transformed levels were analyzed using linear regression. Results yielded a significant association between VE and fibrinogen (p = 0.02) but not D-dimer controlling for relevant covariates. Further investigation of possible interaction effects resulted in a significant association between fibrinogen and the interaction term "VE x gender" (p = 0.05). In a secondary analysis, we reran linear regression models for males and females separately. Gender-specific results revealed that the association between fibrinogen and VE remained significant in males but not females. In sum, the present data support the notion that fibrinogen levels are positively related to VE. Elevated fibrinogen might be one biological pathway by which chronic work stress may impact on teachers' cardiovascular health in the long run.
Improving linear accelerator service response with a real- time electronic event reporting system.
Hoisak, Jeremy D P; Pawlicki, Todd; Kim, Gwe-Ya; Fletcher, Richard; Moore, Kevin L
2014-09-08
To track linear accelerator performance issues, an online event recording system was developed in-house for use by therapists and physicists to log the details of technical problems arising on our institution's four linear accelerators. In use since October 2010, the system was designed so that all clinical physicists would receive email notification when an event was logged. Starting in October 2012, we initiated a pilot project in collaboration with our linear accelerator vendor to explore a new model of service and support, in which event notifications were also sent electronically directly to dedicated engineers at the vendor's technical help desk, who then initiated a response to technical issues. Previously, technical issues were reported by telephone to the vendor's call center, which then disseminated information and coordinated a response with the Technical Support help desk and local service engineers. The purpose of this work was to investigate the improvements to clinical operations resulting from this new service model. The new and old service models were quantitatively compared by reviewing event logs and the oncology information system database in the nine months prior to and after initiation of the project. Here, we focus on events that resulted in an inoperative linear accelerator ("down" machine). Machine downtime, vendor response time, treatment cancellations, and event resolution were evaluated and compared over two equivalent time periods. In 389 clinical days, there were 119 machine-down events: 59 events before and 60 after introduction of the new model. In the new model, median time to service response decreased from 45 to 8 min, service engineer dispatch time decreased 44%, downtime per event decreased from 45 to 20 min, and treatment cancellations decreased 68%. The decreased vendor response time and reduced number of on-site visits by a service engineer resulted in decreased downtime and decreased patient treatment cancellations.
1994-09-01
Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Nam, R K; Klotz, L H; Jewett, M A; Danjoux, C; Trachtenberg, J
1998-01-01
To study the rate of change in prostate specific antigen (PSA velocity) in patients with prostate cancer initially managed by 'watchful waiting'. Serial PSA levels were determined in 141 patients with prostate cancer confirmed by biopsy, who were initially managed expectantly and enrolled between May 1990 and December 1995. Sixty-seven patients eventually underwent surgery (mean age 59 years) because they chose it (the decision for surgery was not based on PSA velocity). A cohort of 74 patients remained on 'watchful waiting' (mean age 69 years). Linear regression and logarithmic transformations were used to segregate those patients who showed a rapid rise, defined as a > 50% rise in PSA per year (or a doubling time of < 2 years) and designated 'rapid risers'. An initial analysis based on a minimum of two PSA values showed that 31% were rapid risers. Only 15% of patients with more than three serial PSA determinations over > or = 6 months showed a rapid rise in PSA level. There was no advantage of log-linear analysis over linear regression models. Three serial PSA determinations over > or = 6 months in patients with clinically localized prostate cancer identifies a subset (15%) of patients with a rapidly rising PSA level. Shorter PSA surveillance with fewer PSA values may falsely identify patients with rapid rises in PSA level. However, further follow-up is required to determine if a rapid rise in PSA level identifies a subset of patients with an aggressive biological phenotype who are either still curable or who have already progressed to incurability through metastatic disease.
Pedraza-Flechas, Ana María; Lope, Virginia; Moreo, Pilar; Ascunce, Nieves; Miranda-García, Josefa; Vidal, Carmen; Sánchez-Contador, Carmen; Santamariña, Carmen; Pedraz-Pingarrón, Carmen; Llobet, Rafael; Aragonés, Nuria; Salas-Trejo, Dolores; Pollán, Marina; Pérez-Gómez, Beatriz
2017-05-01
We explored the relationship between sleep patterns and sleep disorders and mammographic density (MD), a marker of breast cancer risk. Participants in the DDM-Spain/var-DDM study, which included 2878 middle-aged Spanish women, were interviewed via telephone and asked questions on sleep characteristics. Two radiologists assessed MD in their left craneo-caudal mammogram, assisted by a validated semiautomatic-computer tool (DM-scan). We used log-transformed percentage MD as the dependent variable and fitted mixed linear regression models, including known confounding variables. Our results showed that neither sleeping patterns nor sleep disorders were associated with MD. However, women with frequent changes in their bedtime due to anxiety or depression had higher MD (e β :1.53;95%CI:1.04-2.26). Copyright © 2017 Elsevier B.V. All rights reserved.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Hossein-Zadeh, Navid Ghavi
2016-08-01
The aim of this study was to compare seven non-linear mathematical models (Brody, Wood, Dhanoa, Sikka, Nelder, Rook and Dijkstra) to examine their efficiency in describing the lactation curves for milk fat to protein ratio (FPR) in Iranian buffaloes. Data were 43 818 test-day records for FPR from the first three lactations of Iranian buffaloes which were collected on 523 dairy herds in the period from 1996 to 2012 by the Animal Breeding Center of Iran. Each model was fitted to monthly FPR records of buffaloes using the non-linear mixed model procedure (PROC NLMIXED) in SAS and the parameters were estimated. The models were tested for goodness of fit using Akaike's information criterion (AIC), Bayesian information criterion (BIC) and log maximum likelihood (-2 Log L). The Nelder and Sikka mixed models provided the best fit of lactation curve for FPR in the first and second lactations of Iranian buffaloes, respectively. However, Wood, Dhanoa and Sikka mixed models provided the best fit of lactation curve for FPR in the third parity buffaloes. Evaluation of first, second and third lactation features showed that all models, except for Dijkstra model in the third lactation, under-predicted test time at which daily FPR was minimum. On the other hand, minimum FPR was over-predicted by all equations. Evaluation of the different models used in this study indicated that non-linear mixed models were sufficient for fitting test-day FPR records of Iranian buffaloes.
Mazanderani, Ahmad Haeri; Moyo, Faith; Kufa, Tendesayi; Sherman, Gayle G
2018-02-01
To describe baseline HIV-1 RNA viral load (VL) trends within South Africa's Early Infant Diagnosis program 2010-2016, with reference to prevention of mother-to-child transmission guidelines. HIV-1 total nucleic acid polymerase chain reaction (TNA PCR) and RNA VL data from 2010 to 2016 were extracted from the South African National Health Laboratory Service's central data repository. Infants with a positive TNA PCR and subsequent baseline RNA VL taken at age <7 months were included. Descriptive statistics were performed for quantified and lower-than-quantification limit (LQL) results per annum and age in months. Trend analyses were performed using log likelihood ratio tests. Multivariable linear regression was used to model the relationship between RNA VL and predictor variables, whereas logistic regression was used to identify predictors associated with LQL RNA VL results. Among 13,606 infants with a positive HIV-1 TNA PCR linked to a baseline RNA VL, median age of first PCR was 57 days and VL was 98 days. Thirteen thousand one hundred ninety-five (97.0%) infants had a quantified VL and 411 (3.0%) had an LQL result. A significant decline in median VL was observed between 2010 and 2016, from 6.3 log10 (interquartile range: 5.6-6.8) to 5.6 log10 (interquartile range: 4.2-6.5) RNA copies per milliliter, after controlling for age (P < 0.001), with younger age associated with lower VL (P < 0.001). The proportion of infants with a baseline VL <4 Log10 RNA copies per milliliter increased from 5.4% to 21.8%. Subsequent to prevention of mother-to-child transmission Option B implementation in 2013, the proportion of infants with an LQL baseline VL increased from 1.5% to 6.1% (P < 0.001). Between 2010 and 2016, a significant decline in baseline viremia within South Africa's Early Infant Diagnosis program was observed, with loss of detectability among some HIV-infected infants.
Chen, Bo-Ching; Lai, Hung-Yu; Juang, Kai-Wei
2012-06-01
To better understand the ability of switchgrass (Panicum virgatum L.), a perennial grass often relegated to marginal agricultural areas with minimal inputs, to remove cadmium, chromium, and zinc by phytoextraction from contaminated sites, the relationship between plant metal content and biomass yield is expressed in different models to predict the amount of metals switchgrass can extract. These models are reliable in assessing the use of switchgrass for phytoremediation of heavy-metal-contaminated sites. In the present study, linear and exponential decay models are more suitable for presenting the relationship between plant cadmium and dry weight. The maximum extractions of cadmium using switchgrass, as predicted by the linear and exponential decay models, approached 40 and 34 μg pot(-1), respectively. The log normal model was superior in predicting the relationship between plant chromium and dry weight. The predicted maximum extraction of chromium by switchgrass was about 56 μg pot(-1). In addition, the exponential decay and log normal models were better than the linear model in predicting the relationship between plant zinc and dry weight. The maximum extractions of zinc by switchgrass, as predicted by the exponential decay and log normal models, were about 358 and 254 μg pot(-1), respectively. To meet the maximum removal of Cd, Cr, and Zn, one can adopt the optimal timing of harvest as plant Cd, Cr, and Zn approach 450 and 526 mg kg(-1), 266 mg kg(-1), and 3022 and 5000 mg kg(-1), respectively. Due to the well-known agronomic characteristics of cultivation and the high biomass production of switchgrass, it is practicable to use switchgrass for the phytoextraction of heavy metals in situ. Copyright © 2012 Elsevier Inc. All rights reserved.
A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data
Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence
2013-01-01
Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Major controlling factors and prediction models for arsenic uptake from soil to wheat plants.
Dai, Yunchao; Lv, Jialong; Liu, Ke; Zhao, Xiaoyan; Cao, Yingfei
2016-08-01
The application of current Chinese agriculture soil quality standards fails to evaluate the land utilization functions appropriately due to the diversity of soil properties and plant species. Therefore, the standards should be amended. A greenhouse experiment was conducted to investigate arsenic (As) enrichment in various soils from 18 Chinese provinces in parallel with As transfer to 8 wheat varieties. The goal of the study was to build and calibrate soil-wheat threshold models to forecast the As threshold of wheat soils. In Shaanxi soils, Wanmai and Jimai were the most sensitive and insensitive wheat varieties, respectively; and in Jiangxi soils, Zhengmai and Xumai were the most sensitive and insensitive wheat varieties, respectively. Relationships between soil properties and the bioconcentration factor (BCF) were built based on stepwise multiple linear regressions. Soil pH was the best predictor of BCF, and after normalizing the regression equation (Log BCF=0.2054 pH- 3.2055, R(2)=0.8474, n=14, p<0.001), we obtained a calibrated model. Using the calibrated model, a continuous soil-wheat threshold equation (HC5=10((-0.2054 pH+2.9935))+9.2) was obtained for the species-sensitive distribution curve, which was built on Chinese food safety standards. The threshold equation is a helpful tool that can be applied to estimate As uptake from soil to wheat. Copyright © 2016 Elsevier Inc. All rights reserved.
Mujasi, Paschal N; Puig-Junoy, Jaume
2015-08-20
There is need for the Uganda Ministry of Health to understand predictors of primary health care pharmaceutical expenditure among districts in order to guide budget setting and to improve efficiency in allocation of the set budget among districts. Cross sectional, retrospective observational study using secondary data. The value of pharmaceuticals procured by primary health care facilities in 87 randomly selected districts for the Financial Year 2011/2012 was collected. Various specifications of the dependent variable (pharmaceutical expenditure) were used: total pharmaceutical expenditure, Per capita district pharmaceutical expenditure, pharmaceutical expenditure per district health facility and pharmaceutical expenditure per outpatient department visit. Andersen's behaviour model of health services utilisation was used as conceptual framework to identify independent variables likely to influence health care utilisation and hence pharmaceutical expenditure. Econometric analysis was conducted to estimate parameters of various regression models. All models were significant overall (P < 0.01), with explanatory power ranging from 51 to 82%. The log linear model for total pharmaceutical expenditure explained about 80% of the observed variation in total pharmaceutical expenditure (Adjusted R(2) = 0.797) and contained the following variables: Immunisation coverage, Total outpatient department attendance, Urbanisation, Total number of government health facilities and total number of Health Centre IIs. The model based on Per capita Pharmaceutical expenditure explained about 50% of the observed variation in per capita pharmaceutical expenditure (Adjusted R(2) = 0.513) and was more balanced with the following variables: Outpatient per capita attendance, percentage of rural population below poverty line 2005, Male Literacy rate, Whether a district is characterised by MOH as difficult to reach or not and the Human poverty index. The log-linear model based on total pharmaceutical expenditure works acceptably well and can be considered useful for predicting future total pharmaceutical expenditure following observed trends. It can be used as a simple tool for rough estimation of the potential overall national primary health pharmaceutical expenditure to guide budget setting. The model based on pharmaceutical expenditure per capita is a more balanced model containing both need and enabling factor variables. These variables would be useful in allocating any set budget to districts.
Mutimura, Eugene; Hoover, Donald R; Shi, Qiuhu; Dusingize, Jean Claude; Sinayobye, Jean D'Amour; Cohen, Mardge; Anastos, Kathryn
2015-01-01
We longitudinally assessed predictors of insulin resistance (IR) change among HIV-uninfected and HIV-infected (ART-initiators and ART-non-initiators) Rwandan women. HIV-infected (HIV+) and uninfected (HIV-) women provided demographic and clinical measures: age, body mass index (BMI) in Kg/(height in meters)2, Fat-Mass (FMI) and Fat-Free-Mass (FFMI) index, fasting serum glucose and insulin. Homeostasis Model Assessment (HOMA) was calculated to estimate IR change over time in log10 transformed HOMA measured at study enrollment or prior to ART initiation in 3 groups: HIV- (n = 194), HIV+ ART-non-initiators (n=95) and HIV+ ART-initiators (n=371). ANCOVA linear regression models of change in log10-HOMA were fit with all models included the first log10 HOMA as a predictor. Mean±SD log10-HOMA was -0.18±0.39 at the 1st and -0.21±0.41 at the 2nd measure, with mean change of 0.03±0.44. In the final model (all women) BMI at 1st HOMA measure (0.014; 95% CI=0.006-0.021 per kg/m2; p<0.001) and change in BMI from 1st to 2nd measure (0.024; 95% CI=0.013-0.035 per kg/m2; p<0.001) predicted HOMA change. When restricted to subjects with FMI measures, FMI at 1st HOMA measure (0.020; 95% CI=0.010-0.030 per kg/m2; p<0.001) and change in FMI from 1st to 2nd measure (0.032; 95% CI=0.020-0.043 per kg/m2; p<0.0001) predicted change in HOMA. While ART use did not predict change in log10-HOMA, untreated HIV+ women had a significant decline in IR over time. Use or duration of AZT, d4T and EFV was not associated with HOMA change in HIV+ women. Baseline BMI and change in BMI, and in particular fat mass and change in fat mass predicted insulin resistance change over ~3 years in HIV-infected and uninfected Rwandan women. Exposure to specific ART (d4T, AZT, EFV) did not predict insulin resistance change in ART-treated HIV-infected Rwandan women.
Concentration-response of short-term ozone exposure and hospital admissions for asthma in Texas.
Zu, Ke; Liu, Xiaobin; Shi, Liuhua; Tao, Ge; Loftus, Christine T; Lange, Sabine; Goodman, Julie E
2017-07-01
Short-term exposure to ozone has been associated with asthma hospital admissions (HA) and emergency department (ED) visits, but the shape of the concentration-response (C-R) curve is unclear. We conducted a time series analysis of asthma HAs and ambient ozone concentrations in six metropolitan areas in Texas from 2001 to 2013. Using generalized linear regression models, we estimated the effect of daily 8-hour maximum ozone concentrations on asthma HAs for all ages combined, and for those aged 5-14, 15-64, and 65+years. We fit penalized regression splines to evaluate the shape of the C-R curves. Using a log-linear model, estimated risk per 10ppb increase in average daily 8-hour maximum ozone concentrations was highest for children (relative risk [RR]=1.047, 95% confidence interval [CI]: 1.025-1.069), lower for younger adults (RR=1.018, 95% CI: 1.005-1.032), and null for older adults (RR=1.002, 95% CI: 0.981-1.023). However, penalized spline models demonstrated significant nonlinear C-R relationships for all ages combined, children, and younger adults, indicating the existence of thresholds. We did not observe an increased risk of asthma HAs until average daily 8-hour maximum ozone concentrations exceeded approximately 40ppb. Ozone and asthma HAs are significantly associated with each other; susceptibility to ozone is age-dependent, with children at highest risk. C-R relationships between average daily 8-hour maximum ozone concentrations and asthma HAs are significantly curvilinear for all ages combined, children, and younger adults. These nonlinear relationships, as well as the lack of relationship between average daily 8-hour maximum and peak ozone concentrations, have important implications for assessing risks to human health in regulatory settings. Copyright © 2017. Published by Elsevier Ltd.
Wiley, Jeffrey B.; Atkins, John T.; Newell, Dawn A.
2002-01-01
Multiple and simple least-squares regression models for the log10-transformed 1.5- and 2-year recurrence intervals of peak discharges with independent variables describing the basin characteristics (log10-transformed and untransformed) for 236 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions in West Virginia designated as East, North, and South. Regional equations for the 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2.0-, 2.5-, and 3-year recurrence intervals of peak discharges were determined by generalized least-squares regression. Log10-transformed drainage area was the most significant independent variable for all regions. Equations developed in this study are applicable only to rural, unregulated streams within the boundaries of West Virginia. The accuracies of estimating equations are quantified by measuring the average prediction error (from 27.4 to 52.4 percent) and equivalent years of record (from 1.1 to 3.4 years).
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Slopen, Natalie; Loucks, Eric B; Appleton, Allison A; Kawachi, Ichiro; Kubzansky, Laura D; Non, Amy L; Buka, Stephen; Gilman, Stephen E
2015-01-01
Children exposed to social adversity carry a greater risk of poor physical and mental health into adulthood. This increased risk is thought to be due, in part, to inflammatory processes associated with early adversity that contribute to the etiology of many adult illnesses. The current study asks whether aspects of the prenatal social environment are associated with levels of inflammation in adulthood, and whether prenatal and childhood adversity both contribute to adult inflammation. We examined associations of prenatal and childhood adversity assessed through direct interviews of participants in the Collaborative Perinatal Project between 1959 and 1974 with blood levels of C-reactive protein in 355 offspring interviewed in adulthood (mean age=42.2 years). Linear and quantile regression models were used to estimate the effects of prenatal adversity and childhood adversity on adult inflammation, adjusting for age, sex, and race and other potential confounders. In separate linear regression models, high levels of prenatal and childhood adversity were associated with higher CRP in adulthood. When prenatal and childhood adversity were analyzed together, our results support the presence of an effect of prenatal adversity on (log) CRP level in adulthood (β=0.73, 95% CI: 0.26, 1.20) that is independent of childhood adversity and potential confounding factors including maternal health conditions reported during pregnancy. Supplemental analyses revealed similar findings using quantile regression models and logistic regression models that used a clinically-relevant CRP threshold (>3mg/L). In a fully-adjusted model that included childhood adversity, high prenatal adversity was associated with a 3-fold elevated odds (95% CI: 1.15, 8.02) of having a CRP level in adulthood that indicates high risk of cardiovascular disease. Social adversity during the prenatal period is a risk factor for elevated inflammation in adulthood independent of adversities during childhood. This evidence is consistent with studies demonstrating that adverse exposures in the maternal environment during gestation have lasting effects on development of the immune system. If these results reflect causal associations, they suggest that interventions to improve the social and environmental conditions of pregnancy would promote health over the life course. It remains necessary to identify the mechanisms that link maternal conditions during pregnancy to the development of fetal immune and other systems involved in adaptation to environmental stressors. Copyright © 2014 Elsevier Ltd. All rights reserved.
Characterizing Sleep Structure Using the Hypnogram
Swihart, Bruce J.; Caffo, Brian; Bandeen-Roche, Karen; Punjabi, Naresh M.
2008-01-01
Objectives: Research on the effects of sleep-disordered breathing (SDB) on sleep structure has traditionally been based on composite sleep-stage summaries. The primary objective of this investigation was to demonstrate the utility of log-linear and multistate analysis of the sleep hypnogram in evaluating differences in nocturnal sleep structure in subjects with and without SDB. Methods: A community-based sample of middle-aged and older adults with and without SDB matched on age, sex, race, and body mass index was identified from the Sleep Heart Health Study. Sleep was assessed with home polysomnography and categorized into rapid eye movement (REM) and non-REM (NREM) sleep. Log-linear and multistate survival analysis models were used to quantify the frequency and hazard rates of transitioning, respectively, between wakefulness, NREM sleep, and REM sleep. Results: Whereas composite sleep-stage summaries were similar between the two groups, subjects with SDB had higher frequencies and hazard rates for transitioning between the three states. Specifically, log-linear models showed that subjects with SDB had more wake-to-NREM sleep and NREM sleep-to-wake transitions, compared with subjects without SDB. Multistate survival models revealed that subjects with SDB transitioned more quickly from wake-to-NREM sleep and NREM sleep-to-wake than did subjects without SDB. Conclusions: The description of sleep continuity with log-linear and multistate analysis of the sleep hypnogram suggests that such methods can identify differences in sleep structure that are not evident with conventional sleep-stage summaries. Detailed characterization of nocturnal sleep evolution with event history methods provides additional means for testing hypotheses on how specific conditions impact sleep continuity and whether sleep disruption is associated with adverse health outcomes. Citation: Swihart BJ; Caffo B; Bandeen-Roche K; Punjabi NM. Characterizing sleep structure using the hypnogram. J Clin Sleep Med 2008;4(4):349–355. PMID:18763427
Braun, Dominique L; Kouyos, Roger; Oberle, Corinna; Grube, Christina; Joos, Beda; Fellay, Jacques; McLaren, Paul J; Kuster, Herbert; Günthard, Huldrych F
2014-01-01
Best long-term practice in primary HIV-1 infection (PHI) remains unknown for the individual. A risk-based scoring system associated with surrogate markers of HIV-1 disease progression could be helpful to stratify patients with PHI at highest risk for HIV-1 disease progression. We prospectively enrolled 290 individuals with well-documented PHI in the Zurich Primary HIV-1 Infection Study, an open-label, non-randomized, observational, single-center study. Patients could choose to undergo early antiretroviral treatment (eART) and stop it after one year of undetectable viremia, to go on with treatment indefinitely, or to defer treatment. For each patient we calculated an a priori defined "Acute Retroviral Syndrome Severity Score" (ARSSS), consisting of clinical and basic laboratory variables, ranging from zero to ten points. We used linear regression models to assess the association between ARSSS and log baseline viral load (VL), baseline CD4+ cell count, and log viral setpoint (sVL) (i.e. VL measured ≥90 days after infection or treatment interruption). Mean ARSSS was 2.89. CD4+ cell count at baseline was negatively correlated with ARSSS (p = 0.03, n = 289), whereas HIV-RNA levels at baseline showed a strong positive correlation with ARSSS (p<0.001, n = 290). In the regression models, a 1-point increase in the score corresponded to a 0.10 log increase in baseline VL and a CD4+ cell count decline of 12/µl, respectively. In patients with PHI and not undergoing eART, higher ARSSS were significantly associated with higher sVL (p = 0.029, n = 64). In contrast, in patients undergoing eART with subsequent structured treatment interruption, no correlation was found between sVL and ARSSS (p = 0.28, n = 40). The ARSSS is a simple clinical score that correlates with the best-validated surrogate markers of HIV-1 disease progression. In regions where ART is not universally available and eART is not standard this score may help identifying patients who will profit the most from early antiretroviral therapy.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Hawkins, Marquis S; Sevick, Mary Ann; Richardson, Caroline R; Fried, Linda F; Arena, Vincent C; Kriska, Andrea M
2011-08-01
Chronic kidney disease is a condition characterized by the deterioration of the kidney's ability to remove waste products from the body. Although treatments to slow the progression of the disease are available, chronic kidney disease may eventually lead to a complete loss of kidney function. Previous studies have shown that physical activities of moderate intensity may have renal benefits. Few studies have examined the effects of total movement on kidney function. The purpose of this study was to determine the association between time spent at all levels of physical activity intensity and sedentary behavior and kidney function. Data were obtained from the 2003-2004 and 2005-2006 National Health and Nutrition Examination Survey, a cross-sectional study of a complex, multistage probability sample of the US population. Physical activity was assessed using an accelerometer and questionnaire. Glomerular filtration rate (eGFR) was estimated using the Modification of Diet in Renal Disease study formula. To assess linear associations between levels of physical activity and sedentary behavior with log-transformed estimated GFR (eGFR), linear regression was used. In general, physical activity (light and total) was related to log eGFR in females and males. For females, the association between light and total physical activity with log eGFR was consistent regardless of diabetes status. For males, the association between light and total physical activity and log eGFR was only significant in males without diabetes. When examining the association between physical activity, measured objectively with an accelerometer, and kidney function, total and light physical activities were found to be positively associated with kidney function.
Acquah, Gifty E.; Via, Brian K.; Billor, Nedret; Fasina, Oladiran O.; Eckhardt, Lori G.
2016-01-01
As new markets, technologies and economies evolve in the low carbon bioeconomy, forest logging residue, a largely untapped renewable resource will play a vital role. The feedstock can however be variable depending on plant species and plant part component. This heterogeneity can influence the physical, chemical and thermochemical properties of the material, and thus the final yield and quality of products. Although it is challenging to control compositional variability of a batch of feedstock, it is feasible to monitor this heterogeneity and make the necessary changes in process parameters. Such a system will be a first step towards optimization, quality assurance and cost-effectiveness of processes in the emerging biofuel/chemical industry. The objective of this study was therefore to qualitatively classify forest logging residue made up of different plant parts using both near infrared spectroscopy (NIRS) and Fourier transform infrared spectroscopy (FTIRS) together with linear discriminant analysis (LDA). Forest logging residue harvested from several Pinus taeda (loblolly pine) plantations in Alabama, USA, were classified into three plant part components: clean wood, wood and bark and slash (i.e., limbs and foliage). Five-fold cross-validated linear discriminant functions had classification accuracies of over 96% for both NIRS and FTIRS based models. An extra factor/principal component (PC) was however needed to achieve this in FTIRS modeling. Analysis of factor loadings of both NIR and FTIR spectra showed that, the statistically different amount of cellulose in the three plant part components of logging residue contributed to their initial separation. This study demonstrated that NIR or FTIR spectroscopy coupled with PCA and LDA has the potential to be used as a high throughput tool in classifying the plant part makeup of a batch of forest logging residue feedstock. Thus, NIR/FTIR could be employed as a tool to rapidly probe/monitor the variability of forest biomass so that the appropriate online adjustments to parameters can be made in time to ensure process optimization and product quality. PMID:27618901
1990-03-01
and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies
ERIC Educational Resources Information Center
Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre
2018-01-01
Purpose: Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. Method: We propose a…
Live Speech Driven Head-and-Eye Motion Generators.
Le, Binh H; Ma, Xiaohan; Deng, Zhigang
2012-11-01
This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its central idea is to learn separate yet interrelated statistical models for each component (head motion, gaze, or eyelid motion) from a prerecorded facial motion data set: 1) Gaussian Mixture Models and gradient descent optimization algorithm are employed to generate head motion from speech features; 2) Nonlinear Dynamic Canonical Correlation Analysis model is used to synthesize eye gaze from head motion and speech features, and 3) nonnegative linear regression is used to model voluntary eye lid motion and log-normal distribution is used to describe involuntary eye blinks. Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. Our evaluation results clearly show that this approach can significantly outperform the state-of-the-art head and eye motion generation algorithms. In addition, a novel mocap+video hybrid data acquisition technique is introduced to record high-fidelity head movement, eye gaze, and eyelid motion simultaneously.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
ERIC Educational Resources Information Center
Ker, H. W.
2014-01-01
Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
A break-even analysis for dementia care collaboration: Partners in Dementia Care.
Morgan, Robert O; Bass, David M; Judge, Katherine S; Liu, C F; Wilson, Nancy; Snow, A Lynn; Pirraglia, Paul; Garcia-Maldonado, Maurilio; Raia, Paul; Fouladi, N N; Kunik, Mark E
2015-06-01
Dementia is a costly disease. People with dementia, their families, and their friends are affected on personal, emotional, and financial levels. Prior work has shown that the "Partners in Dementia Care" (PDC) intervention addresses unmet needs and improves psychosocial outcomes and satisfaction with care. We examined whether PDC reduced direct Veterans Health Administration (VHA) health care costs compared with usual care. This study was a cost analysis of the PDC intervention in a 30-month trial involving five VHA medical centers. Study subjects were veterans (N = 434) 50 years of age and older with dementia and their caregivers at two intervention (N = 269) and three comparison sites (N = 165). PDC is a telephone-based care coordination and support service for veterans with dementia and their caregivers, delivered through partnerships between VHA medical centers and local Alzheimer's Association chapters. We tested for differences in total VHA health care costs, including hospital, emergency department, nursing home, outpatient, and pharmacy costs, as well as program costs for intervention participants. Covariates included caregiver reports of veterans' cognitive impairment, behavior problems, and personal care dependencies. We used linear mixed model regression to model change in log total cost post-baseline over a 1-year follow-up period. Intervention participants showed higher VHA costs than usual-care participants both before and after the intervention but did not differ significantly regarding change in log costs from pre- to post-baseline periods. Pre-baseline log cost (p ≤ 0.001), baseline cognitive impairment (p ≤ 0.05), number of personal care dependencies (p ≤ 0.01), and VA service priority (p ≤ 0.01) all predicted change in log total cost. These analyses show that PDC meets veterans' needs without significantly increasing VHA health care costs. PDC addresses the priority area of care coordination in the National Plan to Address Alzheimer's Disease, offering a low-cost, structured, protocol-driven, evidence-based method for effectively delivering care coordination.
Generating log-normal mock catalog of galaxies in redshift space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agrawal, Aniket; Makiya, Ryu; Saito, Shun
We present a public code to generate a mock galaxy catalog in redshift space assuming a log-normal probability density function (PDF) of galaxy and matter density fields. We draw galaxies by Poisson-sampling the log-normal field, and calculate the velocity field from the linearised continuity equation of matter fields, assuming zero vorticity. This procedure yields a PDF of the pairwise velocity fields that is qualitatively similar to that of N-body simulations. We check fidelity of the catalog, showing that the measured two-point correlation function and power spectrum in real space agree with the input precisely. We find that a linear biasmore » relation in the power spectrum does not guarantee a linear bias relation in the density contrasts, leading to a cross-correlation coefficient of matter and galaxies deviating from unity on small scales. We also find that linearising the Jacobian of the real-to-redshift space mapping provides a poor model for the two-point statistics in redshift space. That is, non-linear redshift-space distortion is dominated by non-linearity in the Jacobian. The power spectrum in redshift space shows a damping on small scales that is qualitatively similar to that of the well-known Fingers-of-God (FoG) effect due to random velocities, except that the log-normal mock does not include random velocities. This damping is a consequence of non-linearity in the Jacobian, and thus attributing the damping of the power spectrum solely to FoG, as commonly done in the literature, is misleading.« less
Van Gemert-Pijnen, Julia Ewc; Kelders, Saskia M; Bohlmeijer, Ernst T
2014-01-31
Web-based interventions for the early treatment of depressive symptoms can be considered effective in reducing mental complaints. However, there is a limited understanding of which elements in an intervention contribute to effectiveness. For efficiency and effectiveness of interventions, insight is needed into the use of content and persuasive features. The aims of this study were (1) to illustrate how log data can be used to understand the uptake of the content of a Web-based intervention that is based on the acceptance and commitment therapy (ACT) and (2) to discover how log data can be of value for improving the incorporation of content in Web-based interventions. Data from 206 participants (out of the 239) who started the first nine lessons of the Web-based intervention, Living to the Full, were used for a secondary analysis of a subset of the log data of the parent study about adherence to the intervention. The log files used in this study were per lesson: login, start mindfulness, download mindfulness, view success story, view feedback message, start multimedia, turn on text-message coach, turn off text-message coach, and view text message. Differences in usage between lessons were explored with repeated measures ANOVAs (analysis of variance). Differences between groups were explored with one-way ANOVAs. To explore the possible predictive value of the login per lesson quartiles on the outcome measures, four linear regressions were used with login quartiles as predictor and with the outcome measures (Center for Epidemiologic Studies-Depression [CES-D] and the Hospital Anxiety and Depression Scale-Anxiety [HADS-A] on post-intervention and follow-up) as dependent variables. A significant decrease in logins and in the use of content and persuasive features over time was observed. The usage of features varied significantly during the treatment process. The usage of persuasive features increased during the third part of the ACT (commitment to value-based living), which might indicate that at that stage motivational support was relevant. Higher logins over time (9 weeks) corresponded with a higher usage of features (in most cases significant); when predicting depressive symptoms at post-intervention, the linear regression yielded a significant model with login quartile as a significant predictor (explained variance is 2.7%). A better integration of content and persuasive features in the design of the intervention and a better intra-usability of features within the system are needed to identify which combination of features works best for whom. Pattern recognition can be used to tailor the intervention based on usage patterns from the earlier lessons and to support the uptake of content essential for therapy. An adaptable interface for a modular composition of therapy features supposes a dynamic approach for Web-based treatment; not a predefined path for all, but a flexible way to go through all features that have to be used.
2014-01-01
Background Web-based interventions for the early treatment of depressive symptoms can be considered effective in reducing mental complaints. However, there is a limited understanding of which elements in an intervention contribute to effectiveness. For efficiency and effectiveness of interventions, insight is needed into the use of content and persuasive features. Objective The aims of this study were (1) to illustrate how log data can be used to understand the uptake of the content of a Web-based intervention that is based on the acceptance and commitment therapy (ACT) and (2) to discover how log data can be of value for improving the incorporation of content in Web-based interventions. Methods Data from 206 participants (out of the 239) who started the first nine lessons of the Web-based intervention, Living to the Full, were used for a secondary analysis of a subset of the log data of the parent study about adherence to the intervention. The log files used in this study were per lesson: login, start mindfulness, download mindfulness, view success story, view feedback message, start multimedia, turn on text-message coach, turn off text-message coach, and view text message. Differences in usage between lessons were explored with repeated measures ANOVAs (analysis of variance). Differences between groups were explored with one-way ANOVAs. To explore the possible predictive value of the login per lesson quartiles on the outcome measures, four linear regressions were used with login quartiles as predictor and with the outcome measures (Center for Epidemiologic Studies—Depression [CES-D] and the Hospital Anxiety and Depression Scale—Anxiety [HADS-A] on post-intervention and follow-up) as dependent variables. Results A significant decrease in logins and in the use of content and persuasive features over time was observed. The usage of features varied significantly during the treatment process. The usage of persuasive features increased during the third part of the ACT (commitment to value-based living), which might indicate that at that stage motivational support was relevant. Higher logins over time (9 weeks) corresponded with a higher usage of features (in most cases significant); when predicting depressive symptoms at post-intervention, the linear regression yielded a significant model with login quartile as a significant predictor (explained variance is 2.7%). Conclusions A better integration of content and persuasive features in the design of the intervention and a better intra-usability of features within the system are needed to identify which combination of features works best for whom. Pattern recognition can be used to tailor the intervention based on usage patterns from the earlier lessons and to support the uptake of content essential for therapy. An adaptable interface for a modular composition of therapy features supposes a dynamic approach for Web-based treatment; not a predefined path for all, but a flexible way to go through all features that have to be used. PMID:24486914
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Avdeef, Alex
2018-02-02
To predict the aqueous solubility product (K sp ) and the solubility enhancement of cocrystals (CCs), using an approach based on measured drug and coformer intrinsic solubility (S 0 API , S 0 cof ), combined with in silico H-bond descriptors. A regression model was constructed, assuming that the concentration of the uncharged drug (API) can be nearly equated to drug intrinsic solubility (S 0 API ) and that the concentration of the uncharged coformer can be estimated from a linear combination of the log of the coformer intrinsic solubility, S 0 cof , plus in silico H-bond descriptors (Abraham acidities, α, and basicities, β). The optimal model found for n:1 CCs (-log 10 form) is pK sp = 1.12 n pS 0 API + 1.07 pS 0 cof + 1.01 + 0.74 α API ·β cof - 0.61 β API ; r 2 = 0.95, SD = 0.62, N = 38. In illustrative CC systems with unknown K sp , predicted K sp was used in simulation of speciation-pH profiles. The extent and pH dependence of solubility enhancement due to CC formation were examined. Suggestions to improve assay design were made. The predicted CC K sp can be used to simulate pH-dependent solution characteristics of saturated systems containing CCs, with the aim of ranking the selection of coformers, and of optimizing the design of experiments.
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Linear Equations with the Euler Totient Function
2007-02-13
unclassified c . THIS PAGE unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 2 FLORIAN LUCA, PANTELIMON STĂNICĂ...of positive integers n such that φ(n) = φ(n+ 1), and that the set of Phibonacci numbers is A(1,1,−1) + 2. Theorem 2.1. Let C (t, a) = t3 logH(a). Then...the estimate #Aa(x) C (t, a) x log log log x√ log log x LINEAR EQUATIONS WITH THE EULER TOTIENT FUNCTION 3 holds uniformly in a and 1 ≤ t < y. Note
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; ...
2017-10-10
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
Association of blood polychlorinated biphenyls and cholesterol levels among Canadian Inuit.
Singh, Kavita; Chan, Hing Man
2018-01-01
It has generally been thought that Inuit populations have low risk of cardiovascular disease due to high consumption of omega-3 fatty acids found in traditional marine-based diets. However, results of recent surveys showed that Inuit populations are experiencing increasing rates of cardiovascular disease and related risk factors. The purpose of this study was to investigate if blood polychlorinated biphenyls (PCBs) are associated with high cholesterol and related parameters in Canadian Inuit, known risk factors for cardiovascular disease. The Adult Inuit Health Survey (IHS, 2007-2008) included 2595 Inuit participants from three regions of the Canadian Arctic, of which 2191 could be classified as with or without high cholesterol. The high cholesterol outcome was defined by LDL-C > 3.36mmol/L or taking medication(s) that reduce cholesterol, and was examined in adjusted logistic regression models with individual blood levels of PCB congeners, sum of dioxin-like PCBs (∑DL-PCBs), or sum of non-dioxin-like PCBs (∑NDL-PCBs). Statistically significant covariates for high cholesterol were ranked in importance according to the proportion of the model log likelihood explained. Continuous clinical parameters of total cholesterol, triglycerides, LDL-C, and HDL-C were examined in multiple linear regression models with ∑DL-PCBs or ∑NDL-PCBs. A total of 719 participants had high cholesterol (32.8%). PCBs were associated with increased risk of high cholesterol, and higher levels of serum triglycerides, total cholesterol, and LDL-C. No association was observed between PCBs and serum HDL-C. With respect to other statistically significant covariates for high cholesterol, the log likelihood ranking of PCBs generally fell between body mass index (BMI) and age. Further work is needed to corroborate the associations observed with PCBs and lipids in Canadian Inuit and to examine if they are causal in the direction anticipated. Copyright © 2017 Elsevier Inc. All rights reserved.
Bayesian Model Comparison for the Order Restricted RC Association Model
ERIC Educational Resources Information Center
Iliopoulos, G.; Kateri, M.; Ntzoufras, I.
2009-01-01
Association models constitute an attractive alternative to the usual log-linear models for modeling the dependence between classification variables. They impose special structure on the underlying association by assigning scores on the levels of each classification variable, which can be fixed or parametric. Under the general row-column (RC)…
Detection of changes in leaf water content using near- and middle-infrared reflectances
NASA Technical Reports Server (NTRS)
Hunt, E. Raymond, Jr.; Rock, Barrett N.
1989-01-01
A method to detect plant water stress by remote sensing is proposed using indices of near-IR and mid-IR wavelengths. The ability of the Leaf Water Content Index (LWCI) to determine leaf relative water content (RWC) is tested on species with different leaf morphologies. The way in which the Misture Stress Index (MSI) varies with RWC is studied. On test with several species, it is found that LWCI is equal to RWC, although the reflectances at 1.6 microns for two different RWC must be known to accurately predict unknown RWC. A linear correlation is found between MSI and RWC with each species having a different regression equation. Also, MSI is correlated with log sub 10 Equivalent Water Thickness (EWT) with data for all species falling on the same regression line. It is found that the minimum significant change of RWC that could be detected by appying the linear regression equation of MSI to EWT is 52 percent. Because the natural RWC variation from water stress is about 20 percent for most species, it is concluded that the near-IR and mid-IR reflectances cannot be used to remotely sense water stress.
Beydoun, Hind A.; Khanal, Suraj; Zonderman, Alan B.; Beydoun, May A.
2013-01-01
Purpose Emerging evidence suggests that exposure to endocrine disruptors may initiate or exacerbate adiposity and associated health problems. This study examined sex differences in the association of urinary level of bisphenol-A (BPA) with selected indices of glucose homeostasis among U.S. adults. Methods Data analyses were performed using a sample of 1,586 participants from the 2005–2008 National Health and Nutrition Examination Surveys. BPA level and the ratio of BPA-to-creatinine level were defined as log-transformed variables and in quartiles. Selected indices of glucose homeostasis were defined using fasting glucose and insulin data. Multivariate linear and logistic regression models for the hypothesized relationships were constructed after controlling for age, sex, race, education, marital status, smoking status, physical activity, total dietary intake and urinary creatinine concentration. Results Taking 1st quartile as a referent, 3rd quartile of BPA level was positively associated with log-transformed level of insulin and β-cell function (HOMA-β) as well as insulin resistance (log-transformed HOMA-IR; HOMA-IR≥2.5), with significant BPA-by-sex interaction; these associations were stronger among males than among females. Irrespective of sex, the ratio of BPA-to-creatinine level was not predictive of indices of glucose homeostasis. Conclusions A complex association may exist between BPA and hyperinsulinemia among adult U.S. men. Prospective cohort studies are needed to further elucidate endocrine disruptors as determinants of adiposity-related disturbances. PMID:23954568
NASA Astrophysics Data System (ADS)
Dhakal, Y. P.; Kunugi, T.; Suzuki, W.; Aoi, S.
2013-12-01
The Mw 9.1 Tohoku-oki earthquake caused strong shakings of super high rise and high rise buildings constructed on deep sedimentary basins in Japan. Many people felt difficulty in moving inside the high rise buildings even on the Osaka basin located at distances as far as 800 km from the epicentral area. Several empirical equations are proposed to estimate the peak ground motions and absolute acceleration response spectra applicable mainly within 300 to 500km from the source area. On the other hand, Japan Meteorological Agency has recently proposed four classes of absolute velocity response spectra as suitable indices to qualitatively describe the intensity of long-period ground motions based on the observed earthquake records, human experiences, and actual damages that occurred in the high rise and super high rise buildings. The empirical prediction equations have been used in disaster mitigation planning as well as earthquake early warning. In this study, we discuss the results of our preliminary analysis on attenuation relation of absolute velocity response spectra calculated from the observed strong motion records including those from the Mw 9.1 Tohoku-oki earthquake using simple regression models with various model parameters. We used earthquakes, having Mw 6.5 or greater, and focal depths shallower than 50km, which occurred in and around Japanese archipelago. We selected those earthquakes for which the good quality records are available over 50 observation sites combined from K-NET and KiK-net. After a visual inspection on approximately 21,000 three component records from 36 earthquakes, we used about 15,000 good quality records in the period range of 1 to 10s within the hypocentral distance (R) of 800km. We performed regression analyses assuming the following five regression models. (1) log10Y (T) = c+ aMw - log10R - bR (2) log10Y (T) = c+ aMw - log10R - bR +gS (3) log10Y (T) = c+ aMw - log10R - bR + hD (4) log10Y (T) = c+ aMw - log10R - bR +gS +hD (5) log10Y (T) = c+ aMw - log10R - bR +∑gS +hD where Y (T) is the 5% damped peak vector response in cm/s derived from two horizontal component records for a natural period T in second; in (2) S is a dummy variable which is one if a site is located inside a sedimentary basin, otherwise zero. In (3), D is depth to the top of layer having a particular S-wave velocity. We used the deep underground S-wave velocity model available from Japan Seismic Hazard Information Station (J-SHIS). In (5), sites are classified to various sedimentary basins. Analyses show that the standard deviations decrease in the order of the models listed and the all coefficients are significant. Interestingly, coefficients g are found to be different from basin to basin at most periods, and the depth to the top of layer having S-wave velocity of 1.7km/s gives the smallest standard deviation of 0.31 at T=4.4s in (5). This study shows the possibility of describing the observed peak absolute velocity response values by using simple model parameters like site location and sedimentary depth soon after the location and magnitude of an earthquake are known.
The Umov effect in application to an optically thin two-component cloud of cosmic dust
NASA Astrophysics Data System (ADS)
Zubko, Evgenij; Videen, Gorden; Zubko, Nataliya; Shkuratov, Yuriy
2018-04-01
The Umov effect is an inverse correlation between linear polarization of the sunlight scattered by an object and its geometric albedo. The Umov effect has been observed in particulate surfaces, such as planetary regoliths, and recently it also was found in single-scattering small dust particles. Using numerical modeling, we study the Umov effect in a two-component mixture of small irregularly shaped particles. Such a complex chemical composition is suggested in cometary comae and other types of optically thin clouds of cosmic dust. We find that the two-component mixtures of small particles also reveal the Umov effect regardless of the chemical composition of their end-member components. The interrelation between log(Pmax) and log(A) in a two-component mixture of small irregularly shaped particles appears either in a straight linear form or in a slightly curved form. This curvature tends to decrease while the index n in a power-law size distribution r-n grows; at n > 2.5, the log(Pmax)-log(A) diagrams are almost straight linear in appearance. The curvature also noticeably decreases with the packing density of constituent material in irregularly shaped particles forming the mixture. That such a relation exists suggest the Umov effect may also be observed in more complex mixtures.
The Umov effect in application to an optically thin two-component cloud of cosmic dust
NASA Astrophysics Data System (ADS)
Zubko, Evgenij; Videen, Gorden; Zubko, Nataliya; Shkuratov, Yuriy
2018-07-01
The Umov effect is an inverse correlation between linear polarization of the sunlight scattered by an object and its geometric albedo. The Umov effect has been observed in particulate surfaces, such as planetary regoliths, and recently it also was found in single-scattering small dust particles. Using numerical modelling, we study the Umov effect in a two-component mixture of small irregularly shaped particles. Such a complex chemical composition is suggested in cometary comae and other types of optically thin clouds of cosmic dust. We find that the two-component mixtures of small particles also reveal the Umov effect regardless of the chemical composition of their end-member components. The interrelation between log(Pmax) and log(A) in a two-component mixture of small irregularly shaped particles appears either in a straight linear form or in a slightly curved form. This curvature tends to decrease while the index n in a power-law size distribution r-n grows; at n > 2.5, the log(Pmax)-log(A) diagrams are almost straight linear in appearance. The curvature also noticeably decreases with the packing density of constituent material in irregularly shaped particles forming the mixture. That such a relation exists suggests the Umov effect may also be observed in more complex mixtures.
Assessing the relationship between groundwater nitrate and animal feeding operations in Iowa (USA)
Zirkle, Keith W.; Nolan, Bernard T.; Jones, Rena R.; Weyer, Peter J.; Ward, Mary H.; Wheeler, David C.
2016-01-01
Nitrate-nitrogen is a common contaminant of drinking water in many agricultural areas of the United States of America (USA). Ingested nitrate from contaminated drinking water has been linked to an increased risk of several cancers, specific birth defects, and other diseases. In this research, we assessed the relationship between animal feeding operations (AFOs) and groundwater nitrate in private wells in Iowa. We characterized AFOs by swine and total animal units and type (open, confined, or mixed), and we evaluated the number and spatial intensities of AFOs in proximity to private wells. The types of AFO indicate the extent to which a facility is enclosed by a roof. Using linear regression models, we found significant positive associations between the total number of AFOs within 2 km of a well (p trend < 0.001), number of open AFOs within 5 km of a well (p trend < 0.001), and number of mixed AFOs within 30 km of a well (p trend < 0.001) and the log nitrate concentration. Additionally, we found significant increases in log nitrate in the top quartiles for AFO spatial intensity, open AFO spatial intensity, and mixed AFO spatial intensity compared to the bottom quartile (0.171 log(mg/L), 0.319 log(mg/L), and 0.541 log(mg/L), respectively; all p < 0.001). We also explored the spatial distribution of nitrate-nitrogen in drinking wells and found significant spatial clustering of high-nitrate wells (> 5 mg/L) compared with low-nitrate (≤ 5 mg/L) wells (p = 0.001). A generalized additive model for high-nitrate status identified statistically significant areas of risk for high levels of nitrate. Adjustment for some AFO predictor variables explained a portion of the elevated nitrate risk. These results support a relationship between animal feeding operations and groundwater nitrate concentrations and differences in nitrate loss from confined AFOs vs. open or mixed types.
Robinson, Jeffrey D.; Hoover, Donald R.; Venetis, Maria K.; Kearney, Thomas J.; Street, Richard L.
2013-01-01
Purpose Patient-centered communication (PCC) affects psychosocial health outcomes of patients. However, these effects are rarely direct, and our understanding of such effects are largely based on self-report (v observational) data. More information is needed on the pathways by which concrete PCC behaviors affect specific psychosocial outcomes in cancer care. We hypothesized that PCC behaviors increase the satisfaction of patients with surgeons, which, in turn, reduces the postconsultation hopelessness of patients. Patients and Methods In Portland, OR, we videotaped consultations between 147 women newly diagnosed with breast cancer and nine surgeons and administered surveys to participants immediately preconsultation and postconsultation. Consultations were coded for PCC behaviors. Multivariate regression models analyzed the association between PCC and the satisfaction of patients and between satisfaction and hopelessness. Results Levels of hopelessness of patients significantly decreased from preconsultation to postconsultation (P < .001). Two PCC behaviors (ie, patient asserting treatment preference [odds ratio {OR}, 1.50/log unit; 95% CI, 1.01 to 2.23/log unit; P = .042] and surgeon providing good/hopeful news [OR, 1.62/log unit; 95% CI, 1.01 to 2.60/log unit; P = .047]) were independently significantly associated with the satisfaction of patients with surgeons, which, in turn, independently predicted reduced levels of postconsultation hopelessness (linear change, −0.78; 95% CI, 1.44 to −0.12; P = .02). Conclusion Although additional research is needed with larger and more-diverse data sets, these findings suggest the possibility that concrete and trainable PCC behaviors can lower the hopelessness of patients with breast cancer indirectly through their effects on patient satisfaction with care. PMID:23233706
Measurement error corrected sodium and potassium intake estimation using 24-hour urinary excretion.
Huang, Ying; Van Horn, Linda; Tinker, Lesley F; Neuhouser, Marian L; Carbone, Laura; Mossavar-Rahmani, Yasmin; Thomas, Fridtjof; Prentice, Ross L
2014-02-01
Epidemiological studies of the association of sodium and potassium intake with cardiovascular disease risk have almost exclusively relied on self-reported dietary data. Here, 24-hour urinary excretion assessments are used to correct the dietary self-report data for measurement error under the assumption that 24-hour urine recovery provides a biomarker that differs from usual intake according to a classical measurement model. Under this assumption, dietary self-reports underestimate sodium by 0% to 15%, overestimate potassium by 8% to 15%, and underestimate sodium/potassium ratio by ≈20% using food frequency questionnaires, 4-day food records, or three 24-hour dietary recalls in Women's Health Initiative studies. Calibration equations are developed by linear regression of log-transformed 24-hour urine assessments on corresponding log-transformed self-report assessments and several study subject characteristics. For each self-report method, the calibration equations turned out to depend on race and age and strongly on body mass index. After adjustment for temporal variation, calibration equations using food records or recalls explained 45% to 50% of the variation in (log-transformed) 24-hour urine assessments for sodium, 60% to 70% of the variation for potassium, and 55% to 60% of the variation for sodium/potassium ratio. These equations may be suitable for use in epidemiological disease association studies among postmenopausal women. The corresponding signals from food frequency questionnaire data were weak, but calibration equations for the ratios of sodium and potassium/total energy explained ≈35%, 50%, and 45% of log-biomarker variation for sodium, potassium, and their ratio, respectively, after the adjustment for temporal biomarker variation and may be suitable for cautious use in epidemiological studies. Clinical Trial Registration- URL: www.clinicaltrials.gov. Unique identifier: NCT00000611.
Vilar, Santiago; Chakrabarti, Mayukh; Costanzi, Stefano
2010-01-01
The distribution of compounds between blood and brain is a very important consideration for new candidate drug molecules. In this paper, we describe the derivation of two linear discriminant analysis (LDA) models for the prediction of passive blood-brain partitioning, expressed in terms of log BB values. The models are based on computationally derived physicochemical descriptors, namely the octanol/water partition coefficient (log P), the topological polar surface area (TPSA) and the total number of acidic and basic atoms, and were obtained using a homogeneous training set of 307 compounds, for all of which the published experimental log BB data had been determined in vivo. In particular, since molecules with log BB > 0.3 cross the blood-brain barrier (BBB) readily while molecules with log BB < −1 are poorly distributed to the brain, on the basis of these thresholds we derived two distinct models, both of which show a percentage of good classification of about 80%. Notably, the predictive power of our models was confirmed by the analysis of a large external dataset of compounds with reported activity on the central nervous system (CNS) or lack thereof. The calculation of straightforward physicochemical descriptors is the only requirement for the prediction of the log BB of novel compounds through our models, which can be conveniently applied in conjunction with drug design and virtual screenings. PMID:20427217
Vilar, Santiago; Chakrabarti, Mayukh; Costanzi, Stefano
2010-06-01
The distribution of compounds between blood and brain is a very important consideration for new candidate drug molecules. In this paper, we describe the derivation of two linear discriminant analysis (LDA) models for the prediction of passive blood-brain partitioning, expressed in terms of logBB values. The models are based on computationally derived physicochemical descriptors, namely the octanol/water partition coefficient (logP), the topological polar surface area (TPSA) and the total number of acidic and basic atoms, and were obtained using a homogeneous training set of 307 compounds, for all of which the published experimental logBB data had been determined in vivo. In particular, since molecules with logBB>0.3 cross the blood-brain barrier (BBB) readily while molecules with logBB<-1 are poorly distributed to the brain, on the basis of these thresholds we derived two distinct models, both of which show a percentage of good classification of about 80%. Notably, the predictive power of our models was confirmed by the analysis of a large external dataset of compounds with reported activity on the central nervous system (CNS) or lack thereof. The calculation of straightforward physicochemical descriptors is the only requirement for the prediction of the logBB of novel compounds through our models, which can be conveniently applied in conjunction with drug design and virtual screenings. Published by Elsevier Inc.
Kim, Jee Young; Magari, Shannon R; Herrick, Robert F; Smith, Thomas J; Christiani, David C
2004-11-01
Particulate air pollution, specifically the fine particle fraction (PM2.5), has been associated with increased cardiopulmonary morbidity and mortality in general population studies. Occupational exposure to fine particulate matter can exceed ambient levels by a large factor. Due to increased interest in the health effects of particulate matter, many particle sampling methods have been developed In this study, two such measurement methods were used simultaneously and compared. PM2.5 was sampled using a filter-based gravimetric sampling method and a direct-reading instrument, the TSI Inc. model 8520 DUSTTRAK aerosol monitor. Both sampling methods were used to determine the PM2.5 exposure in a group of boilermakers exposed to welding fumes and residual fuel oil ash. The geometric mean PM2.5 concentration was 0.30 mg/m3 (GSD 3.25) and 0.31 mg/m3 (GSD 2.90)from the DUSTTRAK and gravimetric method, respectively. The Spearman rank correlation coefficient for the gravimetric and DUSTTRAK PM2.5 concentrations was 0.68. Linear regression models indicated that log, DUSTTRAK PM2.5 concentrations significantly predicted loge gravimetric PM2.5 concentrations (p < 0.01). The association between log(e) DUSTTRAK and log, gravimetric PM2.5 concentrations was found to be modified by surrogate measures for seasonal variation and type of aerosol. PM2.5 measurements from the DUSTTRAK are well correlated and highly predictive of measurements from the gravimetric sampling method for the aerosols in these work environments. However, results from this study suggest that aerosol particle characteristics may affect the relationship between the gravimetric and DUSTTRAK PM2.5 measurements. Recalibration of the DUSTTRAK for the specific aerosol, as recommended by the manufacturer, may be necessary to produce valid measures of airborne particulate matter.
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Zhang, Peng; Luo, Dandan; Li, Pengfei; Sharpsten, Lucie; Medeiros, Felipe A.
2015-01-01
Glaucoma is a progressive disease due to damage in the optic nerve with associated functional losses. Although the relationship between structural and functional progression in glaucoma is well established, there is disagreement on how this association evolves over time. In addressing this issue, we propose a new class of non-Gaussian linear-mixed models to estimate the correlations among subject-specific effects in multivariate longitudinal studies with a skewed distribution of random effects, to be used in a study of glaucoma. This class provides an efficient estimation of subject-specific effects by modeling the skewed random effects through the log-gamma distribution. It also provides more reliable estimates of the correlations between the random effects. To validate the log-gamma assumption against the usual normality assumption of the random effects, we propose a lack-of-fit test using the profile likelihood function of the shape parameter. We apply this method to data from a prospective observation study, the Diagnostic Innovations in Glaucoma Study, to present a statistically significant association between structural and functional change rates that leads to a better understanding of the progression of glaucoma over time. PMID:26075565
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
In Search of Optimal Cognitive Diagnostic Model(s) for ESL Grammar Test Data
ERIC Educational Resources Information Center
Yi, Yeon-Sook
2017-01-01
This study compares five cognitive diagnostic models in search of optimal one(s) for English as a Second Language grammar test data. Using a unified modeling framework that can represent specific models with proper constraints, the article first fit the full model (the log-linear cognitive diagnostic model, LCDM) and investigated which model…
Fong, Youyi; Yu, Xuesong
2016-01-01
Many modern serial dilution assays are based on fluorescence intensity (FI) readouts. We study optimal transformation model choice for fitting five parameter logistic curves (5PL) to FI-based serial dilution assay data. We first develop a generalized least squares-pseudolikelihood type algorithm for fitting heteroscedastic logistic models. Next we show that the 5PL and log 5PL functions can approximate each other well. We then compare four 5PL models with different choices of log transformation and variance modeling through a Monte Carlo study and real data. Our findings are that the optimal choice depends on the intended use of the fitted curves. PMID:27642502
Adair, T; Hoy, D; Dettrick, Z; Lopez, A D
2012-12-01
Global studies of the long-term association between tobacco consumption and chronic obstructive pulmonary disease (COPD) have relied upon descriptions of trends. To statistically analyse the relationship of tobacco consumption with data on mortality due to COPD over the past 100 years in Australia. Tobacco consumption was reconstructed back to 1887. Log-linear Poisson regression models were used to analyse cumulative cohort and lagged time-specific smoking data and its relationship with COPD mortality. Age-standardised COPD mortality, although likely misclassified with other diseases, decreased for males and females from 1907 until the start of the Second World War in contrast to steadily rising tobacco consumption. Thereafter, COPD mortality rose sharply in line with trends in smoking, peaking in the early 1970s for males and over 20 years later for females, before falling again. Regression models revealed both cumulative and time-specific tobacco consumption to be strongly predictive of COPD mortality, with a time lag of 15 years for males and 20 years for females. Sharp falls in COPD mortality before the Second World War were unrelated to tobacco consumption. Smoking was the primary driver of post-War trends, and the success of anti-smoking campaigns has sharply reduced COPD mortality levels.
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System
1989-09-01
residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Questionable Validity of Poisson Assumptions in a Combined Loglinear/MDS Mapping Model.
ERIC Educational Resources Information Center
Gleason, John M.
1993-01-01
This response to an earlier article on a combined log-linear/MDS model for mapping journals by citation analysis discusses the underlying assumptions of the Poisson model with respect to characteristics of the citation process. The importance of empirical data analysis is also addressed. (nine references) (LRW)
Real-time soil sensing based on fiber optics and spectroscopy
NASA Astrophysics Data System (ADS)
Li, Minzan
2005-08-01
Using NIR spectroscopic techniques, correlation analysis and regression analysis for soil parameter estimation was conducted with raw soil samples collected in a cornfield and a forage field. Soil parameters analyzed were soil moisture, soil organic matter, nitrate nitrogen, soil electrical conductivity and pH. Results showed that all soil parameters could be evaluated by NIR spectral reflectance. For soil moisture, a linear regression model was available at low moisture contents below 30 % db, while an exponential model can be used in a wide range of moisture content up to 100 % db. Nitrate nitrogen estimation required a multi-spectral exponential model and electrical conductivity could be evaluated by a single spectral regression. According to the result above mentioned, a real time soil sensor system based on fiber optics and spectroscopy was developed. The sensor system was composed of a soil subsoiler with four optical fiber probes, a spectrometer, and a control unit. Two optical fiber probes were used for illumination and the other two optical fiber probes for collecting soil reflectance from visible to NIR wavebands at depths around 30 cm. The spectrometer was used to obtain the spectra of reflected lights. The control unit consisted of a data logging device, a personal computer, and a pulse generator. The experiment showed that clear photo-spectral reflectance was obtained from the underground soil. The soil reflectance was equal to that obtained by the desktop spectrophotometer in laboratory tests. Using the spectral reflectance, the soil parameters, such as soil moisture, pH, EC and SOM, were evaluated.
Factors Influencing M.S.W. Students' Interest in Clinical Practice
ERIC Educational Resources Information Center
Perry, Robin
2009-01-01
This study utilizes linear and log-linear stochastic models to examine the impact that a variety of variables (including graduate education) have on M.S.W. students' desires to work in clinical practice. Data was collected biannually (between 1992 and 1998) from a complete population sample of all students entering and exiting accredited graduate…
ERIC Educational Resources Information Center
Madison, Matthew J.; Bradshaw, Laine P.
2015-01-01
Diagnostic classification models are psychometric models that aim to classify examinees according to their mastery or non-mastery of specified latent characteristics. These models are well-suited for providing diagnostic feedback on educational assessments because of their practical efficiency and increased reliability when compared with other…
NASA Astrophysics Data System (ADS)
Mitra, Anindita; Li, Y.-F.; Shimizu, T.; Klämpfl, Tobias; Zimmermann, J. L.; Morfill, G. E.
2012-10-01
Cold Atmospheric Plasma (CAP) is a fast, low cost, simple, easy to handle technology for biological application. Our group has developed a number of different CAP devices using the microwave technology and the surface micro discharge (SMD) technology. In this study, FlatPlaSter2.0 at different time intervals (0.5 to 5 min) is used for microbial inactivation. There is a continuous demand for deactivation of microorganisms associated with raw foods/seeds without loosing their properties. This research focuses on the kinetics of CAP induced microbial inactivation of naturally growing surface microorganisms on seeds. The data were assessed for log- linear and non-log-linear models for survivor curves as a function of time. The Weibull model showed the best fitting performance of the data. No shoulder and tail was observed. The models are focused in terms of the number of log cycles reduction rather than on classical D-values with statistical measurements. The viability of seeds was not affected for CAP treatment times up to 3 min with our device. The optimum result was observed at 1 min with increased percentage of germination from 60.83% to 89.16% compared to the control. This result suggests the advantage and promising role of CAP in food industry.
Calculating the Solubilities of Drugs and Drug-Like Compounds in Octanol.
Alantary, Doaa; Yalkowsky, Samuel
2016-09-01
A modification of the Van't Hoff equation is used to predict the solubility of organic compounds in dry octanol. The new equation describes a linear relationship between the logarithm of the solubility of a solute in octanol to its melting temperature. More than 620 experimentally measured octanol solubilities, collected from the literature, are used to validate the equation without using any regression or fitting. The average absolute error of the prediction is 0.66 log units. Copyright © 2016 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Phobic Anxiety and Plasma Levels of Global Oxidative Stress in Women.
Hagan, Kaitlin A; Wu, Tianying; Rimm, Eric B; Eliassen, A Heather; Okereke, Olivia I
2015-01-01
Psychological distress has been hypothesized to be associated with adverse biologic states such as higher oxidative stress and inflammation. Yet, little is known about associations between a common form of distress - phobic anxiety - and global oxidative stress. Thus, we related phobic anxiety to plasma fluorescent oxidation products (FlOPs), a global oxidative stress marker. We conducted a cross-sectional analysis among 1,325 women (aged 43-70 years) from the Nurses' Health Study. Phobic anxiety was measured using the Crown-Crisp Index (CCI). Adjusted least-squares mean log-transformed FlOPs were calculated across phobic categories. Logistic regression models were used to calculate odds ratios (OR) comparing the highest CCI category (≥6 points) vs. lower scores, across FlOPs quartiles. No association was found between phobic anxiety categories and mean FlOP levels in multivariable adjusted linear models. Similarly, in multivariable logistic regression models there were no associations between FlOPs quartiles and likelihood of being in the highest phobic category. Comparing women in the highest vs. lowest FlOPs quartiles: FlOP_360: OR=0.68 (95% CI: 0.40-1.15); FlOP_320: OR=0.99 (95% CI: 0.61-1.61); FlOP_400: OR=0.92 (95% CI: 0.52, 1.63). No cross-sectional association was found between phobic anxiety and a plasma measure of global oxidative stress in this sample of middle-aged and older women.
Saucedo-Reyes, Daniela; Carrillo-Salazar, José A; Román-Padilla, Lizbeth; Saucedo-Veloz, Crescenciano; Reyes-Santamaría, María I; Ramírez-Gilly, Mariana; Tecante, Alberto
2018-03-01
High hydrostatic pressure inactivation kinetics of Escherichia coli ATCC 25922 and Salmonella enterica subsp. enterica serovar Typhimurium ATCC 14028 ( S. typhimurium) in a low acid mamey pulp at four pressure levels (300, 350, 400, and 450 MPa), different exposure times (0-8 min), and temperature of 25 ± 2℃ were obtained. Survival curves showed deviations from linearity in the form of a tail (upward concavity). The primary models tested were the Weibull model, the modified Gompertz equation, and the biphasic model. The Weibull model gave the best goodness of fit ( R 2 adj > 0.956, root mean square error < 0.290) in the modeling and the lowest Akaike information criterion value. Exponential-logistic and exponential decay models, and Bigelow-type and an empirical models for b'( P) and n( P) parameters, respectively, were tested as alternative secondary models. The process validation considered the two- and one-step nonlinear regressions for making predictions of the survival fraction; both regression types provided an adequate goodness of fit and the one-step nonlinear regression clearly reduced fitting errors. The best candidate model according to the Akaike theory information, with better accuracy and more reliable predictions was the Weibull model integrated by the exponential-logistic and exponential decay secondary models as a function of time and pressure (two-step procedure) or incorporated as one equation (one-step procedure). Both mathematical expressions were used to determine the t d parameter, where the desired reductions ( 5D) (considering d = 5 ( t 5 ) as the criterion of 5 Log 10 reduction (5 D)) in both microorganisms are attainable at 400 MPa for 5.487 ± 0.488 or 5.950 ± 0.329 min, respectively, for the one- or two-step nonlinear procedure.
Oscar, Thomas P
2014-05-01
A study was undertaken to investigate and model behavior of Salmonella on chicken meat during cold storage at constant temperatures. Chicken meat (white, dark, or skin) portions (0.75 cm(3)) were inoculated with a single strain of Salmonella Typhimurium DT104 (2.8 log) followed by storage for 0 to 8 d at -8, 0, 8, 12, 14, or 16 °C for model development and at -4, 4, 10, or 14 °C for model validation. A general regression neural network model was developed with commercial software. Performance of the model was considered acceptable when the proportion of residuals (observed--predicted) in an acceptable prediction zone (pAPZ) from -1 log (fail-safe) to 0.5 logs (fail-dangerous) was ≥ 0.7. Growth of Salmonella Typhimurium DT104 on chicken meat was observed at 12, 14, and 16 °C and was highest on dark meat, intermediate on skin, and lowest on white meat. At lower temperatures (-8 to 10 °C) Salmonella Typhimurium DT104 remained at initial levels throughout 8 d of storage except at 4 °C where there was a small (0.4 log) but significant decline. The model had acceptable performance (pAPZ = 0.929) for dependent data (n = 482) and acceptable performance (pAPZ = 0.923) for independent data (n = 235). Results indicated that it is important to include type of meat as an independent variable in the model and that the model provided valid predictions of the behavior of Salmonella Typhimurium DT104 on chicken skin, white, and dark meat during storage for 0 to 8 d at constant temperatures from -8 to 16 °C. A model for predicting behavior of Salmonella on chicken meat during cold storage was developed and validated. The model will help the chicken industry to better predict and manage this risk to public health. Journal of Food Science © 2014 Institute of Food Technologists® No claim to original US government works.
Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A
2017-02-01
This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Meyer, N; McMenamin, J; Robertson, C; Donaghy, M; Allardice, G; Cooper, D
2008-07-01
In 18 weeks, Health Protection Scotland (HPS) deployed a syndromic surveillance system to early-detect natural or intentional disease outbreaks during the G8 Summit 2005 at Gleneagles, Scotland. The system integrated clinical and non-clinical datasets. Clinical datasets included Accident & Emergency (A&E) syndromes, and General Practice (GPs) codes grouped into syndromes. Non-clinical data included telephone calls to a nurse helpline, laboratory test orders, and hotel staff absenteeism. A cumulative sum-based detection algorithm and a log-linear regression model identified signals in the data. The system had a fax-based track for real-time identification of unusual presentations. Ninety-five signals were triggered by the detection algorithms and four forms were faxed to HPS. Thirteen signals were investigated. The system successfully complemented a traditional surveillance system in identifying a small cluster of gastroenteritis among the police force and triggered interventions to prevent further cases.
Estimating Driving Performance Based on EEG Spectrum Analysis
NASA Astrophysics Data System (ADS)
Lin, Chin-Teng; Wu, Ruei-Cheng; Jung, Tzyy-Ping; Liang, Sheng-Fu; Huang, Teng-Yi
2005-12-01
The growing number of traffic accidents in recent years has become a serious concern to society. Accidents caused by driver's drowsiness behind the steering wheel have a high fatality rate because of the marked decline in the driver's abilities of perception, recognition, and vehicle control abilities while sleepy. Preventing such accidents caused by drowsiness is highly desirable but requires techniques for continuously detecting, estimating, and predicting the level of alertness of drivers and delivering effective feedbacks to maintain their maximum performance. This paper proposes an EEG-based drowsiness estimation system that combines electroencephalogram (EEG) log subband power spectrum, correlation analysis, principal component analysis, and linear regression models to indirectly estimate driver's drowsiness level in a virtual-reality-based driving simulator. Our results demonstrated that it is feasible to accurately estimate quantitatively driving performance, expressed as deviation between the center of the vehicle and the center of the cruising lane, in a realistic driving simulator.
Large biases in regression-based constituent flux estimates: causes and diagnostic tools
Hirsch, Robert M.
2014-01-01
It has been documented in the literature that, in some cases, widely used regression-based models can produce severely biased estimates of long-term mean river fluxes of various constituents. These models, estimated using sample values of concentration, discharge, and date, are used to compute estimated fluxes for a multiyear period at a daily time step. This study compares results of the LOADEST seven-parameter model, LOADEST five-parameter model, and the Weighted Regressions on Time, Discharge, and Season (WRTDS) model using subsampling of six very large datasets to better understand this bias problem. This analysis considers sample datasets for dissolved nitrate and total phosphorus. The results show that LOADEST-7 and LOADEST-5, although they often produce very nearly unbiased results, can produce highly biased results. This study identifies three conditions that can give rise to these severe biases: (1) lack of fit of the log of concentration vs. log discharge relationship, (2) substantial differences in the shape of this relationship across seasons, and (3) severely heteroscedastic residuals. The WRTDS model is more resistant to the bias problem than the LOADEST models but is not immune to them. Understanding the causes of the bias problem is crucial to selecting an appropriate method for flux computations. Diagnostic tools for identifying the potential for bias problems are introduced, and strategies for resolving bias problems are described.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schalasta, Gunnar; Börner, Anna; Speicher, Andrea; Enders, Martin
2018-03-28
Proper management of patients with chronic hepatitis B virus (HBV) infection requires monitoring of plasma or serum HBV DNA levels using a highly sensitive nucleic acid amplification test. Because commercially available assays differ in performance, we compared herein the performance of the Hologic Aptima HBV Quant assay (Aptima) to that of the Roche Cobas TaqMan HBV test for use with the high pure system (HPS/CTM). Assay performance was assessed using HBV reference panels as well as plasma and serum samples from chronically HBV-infected patients. Method correlation, analytical sensitivity, precision/reproducibility, linearity, bias and influence of genotype were evaluated. Data analysis was performed using linear regression, Deming correlation analysis and Bland-Altman analysis. Agreement between the assays for the two reference panels was good, with a difference in assay values vs. target <0.5 log. Qualitative assay results for 159 clinical samples showed good concordance (88.1%; κ=0.75; 95% confidence interval: 0.651-0.845). For the 106 samples quantitated by both assays, viral load results were highly correlated (R=0.92) and differed on average by 0.09 log, with 95.3% of the samples being within the 95% limit of agreement of the assays. Linearity for viral loads 1-7 log was excellent for both assays (R2>0.98). The two assays had similar bias and precision across the different genotypes tested at low viral loads (25-1000 IU/mL). Aptima has a performance comparable with that of HPS/CTM, making it suitable for use for HBV infection monitoring. Aptima runs on a fully automated platform (the Panther system) and therefore offers a significantly improved workflow compared with HPS/CTM.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Yangho; Lee, Byung-Kook, E-mail: bklee@sch.ac.kr
Introduction: The objective of this study was to evaluate associations between blood lead, cadmium, and mercury levels with estimated glomerular filtration rate in a general population of South Korean adults. Methods: This was a cross-sectional study based on data obtained in the Korean National Health and Nutrition Examination Survey (KNHANES) (2008-2010). The final analytical sample consisted of 5924 participants. Estimated glomerular filtration rate (eGFR) was calculated using the MDRD Study equation as an indicator of glomerular function. Results: In multiple linear regression analysis of log2-transformed blood lead as a continuous variable on eGFR, after adjusting for covariates including cadmium andmore » mercury, the difference in eGFR levels associated with doubling of blood lead were -2.624 mL/min per 1.73 m Superscript-Two (95% CI: -3.803 to -1.445). In multiple linear regression analysis using quartiles of blood lead as the independent variable, the difference in eGFR levels comparing participants in the highest versus the lowest quartiles of blood lead was -3.835 mL/min per 1.73 m Superscript-Two (95% CI: -5.730 to -1.939). In a multiple linear regression analysis using blood cadmium and mercury, as continuous or categorical variables, as independent variables, neither metal was a significant predictor of eGFR. Odds ratios (ORs) and 95% CI values for reduced eGFR calculated for log2-transformed blood metals and quartiles of the three metals showed similar trends after adjustment for covariates. Discussion: In this large, representative sample of South Korean adults, elevated blood lead level was consistently associated with lower eGFR levels and with the prevalence of reduced eGFR even in blood lead levels below 10 {mu}g/dL. In conclusion, elevated blood lead level was associated with lower eGFR in a Korean general population, supporting the role of lead as a risk factor for chronic kidney disease.« less
Twenty-year trends in cardiovascular risk factors in India and influence of educational status.
Gupta, Rajeev; Guptha, Soneil; Gupta, V P; Agrawal, Aachu; Gaur, Kiran; Deedwania, Prakash C
2012-12-01
Urban middle-socioeconomic status (SES) subjects have high burden of cardiovascular risk factors in low-income countries. To determine secular trends in risk factors among this population and to correlate risks with educational status we performed epidemiological studies in India. Five cross-sectional studies were performed in middle-SES urban locations in Jaipur, India from years 1992 to 2010. Cluster sampling was performed. Subjects (men, women) aged 20-59 years evaluated were 712 (459, 253) in 1992-94, 558 (286, 272) in 1999-2001, 374 (179, 195) in 2002-03, 887 (414, 473) in 2004-05, and 530 (324, 206) in 2009-10. Data were obtained by history, anthropometry, and fasting blood glucose and lipids estimation. Response rates varied from 55 to 75%. Mean values and risk factor prevalence were determined. Secular trends were identified using quadratic and log-linear regression and chi-squared for trend. Across the studies, there was high prevalence of overweight, hypertension, and lipid abnormalities. Age- and sex-adjusted trends showed significant increases in mean body mass index (BMI), fasting glucose, total cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides (quadratic and log-linear regression, p < 0.001). Systolic blood pressure (BP) decreased while insignificant changes were observed for waist-hip ratio and low-density lipoprotein (LDL) cholesterol. Categorical trends showed increase in overweight and decrease in smoking (p < 0.05); insignificant changes were observed in truncal obesity, hypertension, hypercholesterolaemia, and diabetes. Adjustment for educational status attenuated linear trends in BMI and total and LDL cholesterol and accentuated trends in systolic BP, glucose, and HDL cholesterol. There was significant association of an increase in education with decline in smoking and an increase in overweight (two-line regression p < 0.05). In Indian urban middle-SES subjects there is high prevalence of cardiovascular risk factors. Over a 20-year period BMI and overweight increased, smoking and systolic BP decreased, and truncal obesity, hypercholesterolaemia, and diabetes remained stable. Increasing educational status attenuated trends for systolic BP, glucose and HDL cholesterol, and BMI.
Xu, Feng; Liang, Xinmiao; Lin, Bingcheng
2002-01-01
Research efforts dealing with chemical transportation in soils are needed to prevent damage to ground water. Methanol-containing solvents can increase the translocation of nonionic organic chemicals (NOCs). In this study, a general log-linear retention equation, log k' = log k'w - Sphi (Eq. [1]), was developed to describe the mobilities of NOCs in soil column chromatography (SCC). The term phi denotes the volume fraction of methanol in eluent, k' is the capacity factor of a solute at a certain phi value, and log k'w and -S are the intercept and slope of the log k' vs. phi plot. Two reference soils (GSE 17204 and GSE 17205) were used as packing materials, and were eluted by isocratic methanol-water mixtures. A model of linear solvation energy relationships (LSER) was applied to analyze the k' from molecular interactions. The most important factor determining the transportation was found to be the solute hydrophobic partition in soils, and the second-most important factor was the solute hydrogen-bond basicity (hydrogen-bond accepting ability), while the less important factor was the solute dipolarity-polarizability. The solute hydrogen-bond acidity (hydrogen-bond donating ability) was statistically unimportant and deletable. From the LSER model, one could also obtain Eq. [1]. The experimental k' data of 121 NOCs can be accurately explained by Eq. [1]. The equation is promising to estimate the solute mobility in pure water by extrapolating from lower-capacity factors obtained in methanol-water mixed eluents.
Chronic Kidney Disease Is Associated With White Matter Hyperintensity Volume
Khatri, Minesh; Wright, Clinton B.; Nickolas, Thomas L.; Yoshita, Mitsuhiro; Paik, Myunghee C.; Kranwinkel, Grace; Sacco, Ralph L.; DeCarli, Charles
2010-01-01
Background and Purpose White matter hyperintensities have been associated with increased risk of stroke, cognitive decline, and dementia. Chronic kidney disease is a risk factor for vascular disease and has been associated with inflammation and endothelial dysfunction, which have been implicated in the pathogenesis of white matter hyperintensities. Few studies have explored the relationship between chronic kidney disease and white matter hyperintensities. Methods The Northern Manhattan Study is a prospective, community-based cohort of which a subset of stroke-free participants underwent MRIs. MRIs were analyzed quantitatively for white matter hyperintensities volume, which was log-transformed to yield a normal distribution (log-white matter hyperintensity volume). Kidney function was modeled using serum creatinine, the Cockcroft-Gault formula for creatinine clearance, and the Modification of Diet in Renal Disease formula for estimated glomerular filtration rate. Creatinine clearance and estimated glomerular filtration rate were trichotomized to 15 to 60 mL/min, 60 to 90 mL/min, and >90 mL/min (reference). Linear regression was used to measure the association between kidney function and log-white matter hyperintensity volume adjusting for age, gender, race–ethnicity, education, cardiac disease, diabetes, homocysteine, and hypertension. Results Baseline data were available on 615 subjects (mean age 70 years, 60% women, 18% whites, 21% blacks, 62% Hispanics). In multivariate analysis, creatinine clearance 15 to 60 mL/min was associated with increased log-white matter hyperintensity volume (β 0.322; 95% CI, 0.095 to 0.550) as was estimated glomerular filtration rate 15 to 60 mL/min (β 0.322; 95% CI, 0.080 to 0.564). Serum creatinine, per 1-mg/dL increase, was also positively associated with log-white matter hyperintensity volume (β 1.479; 95% CI, 1.067 to 2.050). Conclusions The association between moderate–severe chronic kidney disease and white matter hyperintensity volume highlights the growing importance of kidney disease as a possible determinant of cerebrovascular disease and/or as a marker of microangiopathy. PMID:17962588
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Modeling the Geographic Consequence and Pattern of Dengue Fever Transmission in Thailand.
Bekoe, Collins; Pansombut, Tatdow; Riyapan, Pakwan; Kakchapati, Sampurna; Phon-On, Aniruth
2017-05-04
Dengue fever is one of the infectious diseases that is still a public health problem in Thailand. This study considers in detail, the geographic consequence, seasonal and pattern of dengue fever transmission among the 76 provinces of Thailand from 2003 to 2015. A cross-sectional study. The data for the study was from the Department of Disease Control under the Bureau of Epidemiology, Thailand. The quarterly effects and location on the transmission of dengue was modeled using an alternative additive log-linear model. The model fitted well as illustrated by the residual plots and the Again, the model showed that dengue fever is high in the second quarter of every year from May to August. There was an evidence of an increase in the trend of dengue annually from 2003 to 2015. There was a difference in the distribution of dengue fever within and between provinces. The areas of high risks were the central and southern regions of Thailand. The log-linear model provided a simple medium of modeling dengue fever transmission. The results are very important in the geographic distribution of dengue fever patterns.
Biostatistics Series Module 10: Brief Overview of Multivariate Methods.
Hazra, Avijit; Gogtay, Nithya
2017-01-01
Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance
1989-12-01
v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
voom: precision weights unlock linear model analysis tools for RNA-seq read counts
2014-01-01
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods. PMID:24485249
voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.
Law, Charity W; Chen, Yunshun; Shi, Wei; Smyth, Gordon K
2014-02-03
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Predicting Error Bars for QSAR Models
NASA Astrophysics Data System (ADS)
Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.
Violanti, John M; Fekedulegn, Desta; Andrew, Michael E; Hartley, Tara A; Charles, Luenda E; Miller, Diane B; Burchfiel, Cecil M
2017-01-01
Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n=338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value=0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. Copyright © 2016 Elsevier Ltd. All rights reserved.
Violanti, John M.; Fekedulegn, Desta; Andrew, Michael E.; Hartley, Tara A.; Charles, Luenda E.; Miller, Diane B.; Burchfiel, Cecil M.
2016-01-01
Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n = 338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45 min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value = 0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. PMID:27816820
Basa, Ranor C B; Davies, Vince; Li, Xiaoxiao; Murali, Bhavya; Shah, Jinel; Yang, Bing; Li, Shi; Khan, Mohammad W; Tian, Mengxi; Tejada, Ruth; Hassan, Avan; Washington, Allen; Mukherjee, Bhramar; Carethers, John M; McGuire, Kathleen L
2016-01-01
Colorectal cancer is a leading cause of cancer related deaths in the U.S., with African-Americans having higher incidence and mortality rates than Caucasian-Americans. Recent studies have demonstrated that anti-tumor cytotoxic T lymphocytes provide protection to patients with colon cancer while patients deficient in these responses have significantly worse prognosis. To determine if differences in cytotoxic immunity might play a role in racial disparities in colorectal cancer 258 microsatellite-stable colon tumors were examined for infiltrating immune biomarkers via immunohistochemistry. Descriptive summary statistics were calculated using two-sample Wilcoxon rank sum tests, while linear regression models with log-transformed data were used to assess differences in race and Pearson and Spearman correlations were used to correlate different biomarkers. The association between different biomarkers was also assessed using linear regression after adjusting for covariates. No significant differences were observed in CD8+ (p = 0.83), CD57+ (p = 0.55), and IL-17-expressing (p = 0.63) cell numbers within the tumor samples tested. When infiltration of granzyme B+ cells was analyzed, however, a significant difference was observed, with African Americans having lower infiltration of cells expressing this cytotoxic marker than Caucasians (p<0.01). Analysis of infiltrating granzyme B+ cells at the invasive borders of the tumor revealed an even greater difference by race (p<0.001). Taken together, the data presented suggest differences in anti-tumor immune cytotoxicity may be a contributing factor in the racial disparities observed in colorectal cancer.
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
Li, Feiming; Gimpel, John R; Arenson, Ethan; Song, Hao; Bates, Bruce P; Ludwin, Fredric
2014-04-01
Few studies have investigated how well scores from the Comprehensive Osteopathic Medical Licensing Examination-USA (COMLEX-USA) series predict resident outcomes, such as performance on board certification examinations. To determine how well COMLEX-USA predicts performance on the American Osteopathic Board of Emergency Medicine (AOBEM) Part I certification examination. The target study population was first-time examinees who took AOBEM Part I in 2011 and 2012 with matched performances on COMLEX-USA Level 1, Level 2-Cognitive Evaluation (CE), and Level 3. Pearson correlations were computed between AOBEM Part I first-attempt scores and COMLEX-USA performances to measure the association between these examinations. Stepwise linear regression analysis was conducted to predict AOBEM Part I scores by the 3 COMLEX-USA scores. An independent t test was conducted to compare mean COMLEX-USA performances between candidates who passed and who failed AOBEM Part I, and a stepwise logistic regression analysis was used to predict the log-odds of passing AOBEM Part I on the basis of COMLEX-USA scores. Scores from AOBEM Part I had the highest correlation with COMLEX-USA Level 3 scores (.57) and slightly lower correlation with COMLEX-USA Level 2-CE scores (.53). The lowest correlation was between AOBEM Part I and COMLEX-USA Level 1 scores (.47). According to the stepwise regression model, COMLEX-USA Level 1 and Level 2-CE scores, which residency programs often use as selection criteria, together explained 30% of variance in AOBEM Part I scores. Adding Level 3 scores explained 37% of variance. The independent t test indicated that the 397 examinees passing AOBEM Part I performed significantly better than the 54 examinees failing AOBEM Part I in all 3 COMLEX-USA levels (P<.001 for all 3 levels). The logistic regression model showed that COMLEX-USA Level 1 and Level 3 scores predicted the log-odds of passing AOBEM Part I (P=.03 and P<.001, respectively). The present study empirically supported the predictive and discriminant validities of the COMLEX-USA series in relation to the AOBEM Part I certification examination. Although residency programs may use COMLEX-USA Level 1 and Level 2-CE scores as partial criteria in selecting residents, Level 3 scores, though typically not available at the time of application, are actually the most statistically related to performances on AOBEM Part I.
Ahn, Jaeil; Mukherjee, Bhramar; Banerjee, Mousumi; Cooney, Kathleen A.
2011-01-01
Summary The stereotype regression model for categorical outcomes, proposed by Anderson (1984) is nested between the baseline category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline-category (or multinomial logistic) model due to a product representation of the log odds-ratios in terms of a common parameter corresponding to each predictor and category specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multidimensional in nature. As pointed out by Greenland (1994), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome-stratified sampling as in case-control studies. In addition, for matched case-control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood based testing approaches due to non-linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men’s Health Study, a case-control study of prostate cancer in African-American men aged 40 to 79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastatsis (TNM) as the categorical response of interest. PMID:19731262
Analytical methods in multivariate highway safety exposure data estimation
DOT National Transportation Integrated Search
1984-01-01
Three general analytical techniques which may be of use in : extending, enhancing, and combining highway accident exposure data are : discussed. The techniques are log-linear modelling, iterative propor : tional fitting and the expectation maximizati...
NASA Astrophysics Data System (ADS)
Schaperow, J.; Cooper, M. G.; Cooley, S. W.; Alam, S.; Smith, L. C.; Lettenmaier, D. P.
2017-12-01
As climate regimes shift, streamflows and our ability to predict them will change, as well. Elasticity of summer minimum streamflow is estimated for 138 unimpaired headwater river basins across the maritime western US mountains to better understand how climatologic variables and geologic characteristics interact to determine the response of summer low flows to winter precipitation (PPT), spring snow water equivalent (SWE), and summertime potential evapotranspiration (PET). Elasticities are calculated using log log linear regression, and linear reservoir storage coefficients are used to represent basin geology. Storage coefficients are estimated using baseflow recession analysis. On average, SWE, PET, and PPT explain about 1/3 of the summertime low flow variance. Snow-dominated basins with long timescales of baseflow recession are least sensitive to changes in SWE, PPT, and PET, while rainfall-dominated, faster draining basins are most sensitive. There are also implications for the predictability of summer low flows. The R2 between streamflow and SWE drops from 0.62 to 0.47 from snow-dominated to rain-dominated basins, while there is no corresponding increase in R2 between streamflow and PPT.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Prediction equation for calculating fat mass in young Indian adults.
Sandhu, Jaspal Singh; Gupta, Giniya; Shenoy, Shweta
2010-06-01
Accurate measurement or prediction of fat mass is useful in physiology, nutrition and clinical medicine. Most predictive equations currently used to assess percentage of body fat or fat mass, using simple anthropometric measurements were derived from people in western societies and they may not be appropriate for individuals with other genotypic and phenotypic characteristics. We developed equations to predict fat mass from anthropometric measurements in young Indian adults. Fat mass was measured in 60 females and 58 males, aged 20 to 29 yrs by using hydrostatic weighing and by simultaneous measurement of residual lung volume. Anthropometric measure included weight (kg), height (m) and 4 skinfold thickness [STs (mm)]. Sex specific linear regression model was developed with fat mass as the dependent variable and all anthropometric measures as independent variables. The prediction equation obtained for fat mass (kg) for males was 8.46+0.32 (weight) - 15.16 (height) + 9.54 (log of sum of 4 STs) (R2= 0. 53, SEE=3.42 kg) and - 20.22 + 0.33 (weight) + 3.44 (height) + 7.66 (log of sum of 4 STs) (R2=0.72, SEE=3.01kg) for females. A new prediction equation for the measurement of fat mass was derived and internally validated in young Indian adults using simple anthropometric measurements.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
Anumol, Tarun; Sgroi, Massimiliano; Park, Minkyu; Roccaro, Paolo; Snyder, Shane A
2015-06-01
This study investigated the applicability of bulk organic parameters like dissolved organic carbon (DOC), UV absorbance at 254 nm (UV254), and total fluorescence (TF) to act as surrogates in predicting trace organic compound (TOrC) removal by granular activated carbon in water reuse applications. Using rapid small-scale column testing, empirical linear correlations for thirteen TOrCs were determined with DOC, UV254, and TF in four wastewater effluents. Linear correlations (R(2) > 0.7) were obtained for eight TOrCs in each water quality in the UV254 model, while ten TOrCs had R(2) > 0.7 in the TF model. Conversely, DOC was shown to be a poor surrogate for TOrC breakthrough prediction. When the data from all four water qualities was combined, good linear correlations were still obtained with TF having higher R(2) than UV254 especially for TOrCs with log Dow>1. Excellent linear relationship (R(2) > 0.9) between log Dow and the removal of TOrC at 0% surrogate removal (y-intercept) were obtained for the five neutral TOrCs tested in this study. Positively charged TOrCs had enhanced removals due to electrostatic interactions with negatively charged GAC that caused them to deviate from removals that would be expected with their log Dow. Application of the empirical linear correlation models to full-scale samples provided good results for six of seven TOrCs (except meprobamate) tested when comparing predicted TOrC removal by UV254 and TF with actual removals for GAC in all the five samples tested. Surrogate predictions using UV254 and TF provide valuable tools for rapid or on-line monitoring of GAC performance and can result in cost savings by extended GAC run times as compared to using DOC breakthrough to trigger regeneration or replacement. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lamm, Ryan; Mathews, Steven N; Yang, Jie; Park, Jihye; Talamini, Mark; Pryor, Aurora D; Telem, Dana
2017-05-01
This study sought to characterize in-hospital post-colectomy mortality in New York State. One hundred sixty thousand seven hundred ninety-two patients who underwent colectomy from 1995 to 2014 were analyzed from the all-payer New York Statewide Planning and Research Cooperative System (SPARCS) database. Linear trends of in-hospital mortality rate over 20 years were calculated using log-linear regression models. Chi-square tests were used to compare categorical variables between patients. Multivariable regression models were further used to calculate risk of in-hospital mortality associated with specific demographics, co-morbidities, and perioperative complications. From 1995 to 2014, 7308 (4.5%) in-hospital mortalities occurred within 30 days of surgery. Over this time period, the rate of overall in-hospital post-colectomy mortality decreased by 3.3% (6.3 to 3%, p < 0.0001). The risk of in-hospital mortality for patients receiving emergent and elective surgery decreased by 1% (RR 0.99 [0.98-1.00], p = 0.0005) and 5% (RR 0.95 [0.94-0.96], p < 0.0001) each year, respectively. Patients who underwent open surgeries were more likely to experience in-hospital mortality (adjusted OR 3.65 [3.16-4.21], p < 0.0001), with an increased risk of in-hospital mortality each year (RR 1.01 [1.00-1.03], p = 0.0387). Numerous other risk factors were identified. In-hospital post-colectomy mortality decreased at a slower rate in emergent versus elective surgeries. The risk of in-hospital mortality has increased in open colectomies.
NASA Astrophysics Data System (ADS)
Reyer, D.; Philipp, S. L.
2014-09-01
Information about geomechanical and physical rock properties, particularly uniaxial compressive strength (UCS), are needed for geomechanical model development and updating with logging-while-drilling methods to minimise costs and risks of the drilling process. The following parameters with importance at different stages of geothermal exploitation and drilling are presented for typical sedimentary and volcanic rocks of the Northwest German Basin (NWGB): physical (P wave velocities, porosity, and bulk and grain density) and geomechanical parameters (UCS, static Young's modulus, destruction work and indirect tensile strength both perpendicular and parallel to bedding) for 35 rock samples from quarries and 14 core samples of sandstones and carbonate rocks. With regression analyses (linear- and non-linear) empirical relations are developed to predict UCS values from all other parameters. Analyses focus on sedimentary rocks and were repeated separately for clastic rock samples or carbonate rock samples as well as for outcrop samples or core samples. Empirical relations have high statistical significance for Young's modulus, tensile strength and destruction work; for physical properties, there is a wider scatter of data and prediction of UCS is less precise. For most relations, properties of core samples plot within the scatter of outcrop samples and lie within the 90% prediction bands of developed regression functions. The results indicate the applicability of empirical relations that are based on outcrop data on questions related to drilling operations when the database contains a sufficient number of samples with varying rock properties. The presented equations may help to predict UCS values for sedimentary rocks at depth, and thus develop suitable geomechanical models for the adaptation of the drilling strategy on rock mechanical conditions in the NWGB.
He, Wensi; Yan, Fangyou; Jia, Qingzhu; Xia, Shuqian; Wang, Qiang
2018-03-01
The hazardous potential of ionic liquids (ILs) is becoming an issue of great concern due to their important role in many industrial fields as green agents. The mathematical model for the toxicological effects of ILs is useful for the risk assessment and design of environmentally benign ILs. The objective of this work is to develop QSAR models to describe the minimal inhibitory concentration (MIC) and minimal bactericidal concentration (MBC) of ILs against Staphylococcus aureus (S. aureus). A total of 169 and 101 ILs with MICs and MBCs, respectively, are used to obtain multiple linear regression models based on matrix norm indexes. The norm indexes used in this work are proposed by our research group and they are first applied to estimate the antibacterial toxicity of these ILs against S. aureus. These two models precisely and reliably calculated the IL toxicities with a square of correlation coefficient (R 2 ) of 0.919 and a standard error of estimate (SE) of 0.341 (in log unit of mM) for pMIC, and an R 2 of 0.913 and SE of 0.282 for pMBC. Copyright © 2017 Elsevier Ltd. All rights reserved.
Workie, Demeke Lakew; Zike, Dereje Tesfaye; Fenta, Haile Mekonnen; Mekonnen, Mulusew Admasu
2018-05-10
Ethiopia is among countries with low contraceptive usage prevalence rate and resulted in high total fertility rate and unwanted pregnancy which intern affects the maternal and child health status. This study aimed to investigate the major factors that affect the number of modern contraceptive users at service delivery point in Ethiopia. The Performance Monitoring and Accountability2020/Ethiopia data collected between March and April 2016 at round-4 from 461 eligible service delivery points were in this study. The weighted log-linear negative binomial model applied to analyze the service delivery point's data. Fifty percent of service delivery points in Ethiopia given service for 61 modern contraceptive users with the interquartile range of 0.62. The expected log number of modern contraceptive users at rural was 1.05 (95% Wald CI: - 1.42 to - 0.68) lower than the expected log number of modern contraceptive users at urban. In addition, the expected log count of modern contraceptive users at others facility type was 0.58 lower than the expected log count of modern contraceptive users at the health center. The numbers of nurses/midwives were affecting the number of modern contraceptive users. Since, the incidence rate of modern contraceptive users increased by one due to an additional nurse in the delivery point. Among different factors considered in this study, residence, region, facility type, the number of days per week family planning offered, the number of nurses/midwives and number of medical assistants were to be associated with the number of modern contraceptive users. Thus, the Government of Ethiopia would take immediate steps to address causes of the number of modern contraceptive users in Ethiopia.
Werner, Jan; Griebeler, Eva Maria
2014-01-01
We tested if growth rates of recent taxa are unequivocally separated between endotherms and ectotherms, and compared these to dinosaurian growth rates. We therefore performed linear regression analyses on the log-transformed maximum growth rate against log-transformed body mass at maximum growth for extant altricial birds, precocial birds, eutherians, marsupials, reptiles, fishes and dinosaurs. Regression models of precocial birds (and fishes) strongly differed from Case's study (1978), which is often used to compare dinosaurian growth rates to those of extant vertebrates. For all taxonomic groups, the slope of 0.75 expected from the Metabolic Theory of Ecology was statistically supported. To compare growth rates between taxonomic groups we therefore used regressions with this fixed slope and group-specific intercepts. On average, maximum growth rates of ectotherms were about 10 (reptiles) to 20 (fishes) times (in comparison to mammals) or even 45 (reptiles) to 100 (fishes) times (in comparison to birds) lower than in endotherms. While on average all taxa were clearly separated from each other, individual growth rates overlapped between several taxa and even between endotherms and ectotherms. Dinosaurs had growth rates intermediate between similar sized/scaled-up reptiles and mammals, but a much lower rate than scaled-up birds. All dinosaurian growth rates were within the range of extant reptiles and mammals, and were lower than those of birds. Under the assumption that growth rate and metabolic rate are indeed linked, our results suggest two alternative interpretations. Compared to other sauropsids, the growth rates of studied dinosaurs clearly indicate that they had an ectothermic rather than an endothermic metabolic rate. Compared to other vertebrate growth rates, the overall high variability in growth rates of extant groups and the high overlap between individual growth rates of endothermic and ectothermic extant species make it impossible to rule out either of the two thermoregulation strategies for studied dinosaurs.
Werner, Jan; Griebeler, Eva Maria
2014-01-01
We tested if growth rates of recent taxa are unequivocally separated between endotherms and ectotherms, and compared these to dinosaurian growth rates. We therefore performed linear regression analyses on the log-transformed maximum growth rate against log-transformed body mass at maximum growth for extant altricial birds, precocial birds, eutherians, marsupials, reptiles, fishes and dinosaurs. Regression models of precocial birds (and fishes) strongly differed from Case’s study (1978), which is often used to compare dinosaurian growth rates to those of extant vertebrates. For all taxonomic groups, the slope of 0.75 expected from the Metabolic Theory of Ecology was statistically supported. To compare growth rates between taxonomic groups we therefore used regressions with this fixed slope and group-specific intercepts. On average, maximum growth rates of ectotherms were about 10 (reptiles) to 20 (fishes) times (in comparison to mammals) or even 45 (reptiles) to 100 (fishes) times (in comparison to birds) lower than in endotherms. While on average all taxa were clearly separated from each other, individual growth rates overlapped between several taxa and even between endotherms and ectotherms. Dinosaurs had growth rates intermediate between similar sized/scaled-up reptiles and mammals, but a much lower rate than scaled-up birds. All dinosaurian growth rates were within the range of extant reptiles and mammals, and were lower than those of birds. Under the assumption that growth rate and metabolic rate are indeed linked, our results suggest two alternative interpretations. Compared to other sauropsids, the growth rates of studied dinosaurs clearly indicate that they had an ectothermic rather than an endothermic metabolic rate. Compared to other vertebrate growth rates, the overall high variability in growth rates of extant groups and the high overlap between individual growth rates of endothermic and ectothermic extant species make it impossible to rule out either of the two thermoregulation strategies for studied dinosaurs. PMID:24586409
Partitioning of polar and non-polar neutral organic chemicals into human and cow milk.
Geisler, Anett; Endo, Satoshi; Goss, Kai-Uwe
2011-10-01
The aim of this work was to develop a predictive model for milk/water partition coefficients of neutral organic compounds. Batch experiments were performed for 119 diverse organic chemicals in human milk and raw and processed cow milk at 37°C. No differences (<0.3 log units) in the partition coefficients of these types of milk were observed. The polyparameter linear free energy relationship model fit the calibration data well (SD=0.22 log units). An experimental validation data set including hormones and hormone active compounds was predicted satisfactorily by the model. An alternative modelling approach based on log K(ow) revealed a poorer performance. The model presented here provides a significant improvement in predicting enrichment of potentially hazardous chemicals in milk. In combination with physiologically based pharmacokinetic modelling this improvement in the estimation of milk/water partitioning coefficients may allow a better risk assessment for a wide range of neutral organic chemicals. Copyright © 2011 Elsevier Ltd. All rights reserved.
Statistical power for detecting trends with applications to seabird monitoring
Hatch, Shyla A.
2003-01-01
Power analysis is helpful in defining goals for ecological monitoring and evaluating the performance of ongoing efforts. I examined detection standards proposed for population monitoring of seabirds using two programs (MONITOR and TRENDS) specially designed for power analysis of trend data. Neither program models within- and among-years components of variance explicitly and independently, thus an error term that incorporates both components is an essential input. Residual variation in seabird counts consisted of day-to-day variation within years and unexplained variation among years in approximately equal parts. The appropriate measure of error for power analysis is the standard error of estimation (S.E.est) from a regression of annual means against year. Replicate counts within years are helpful in minimizing S.E.est but should not be treated as independent samples for estimating power to detect trends. Other issues include a choice of assumptions about variance structure and selection of an exponential or linear model of population change. Seabird count data are characterized by strong correlations between S.D. and mean, thus a constant CV model is appropriate for power calculations. Time series were fit about equally well with exponential or linear models, but log transformation ensures equal variances over time, a basic assumption of regression analysis. Using sample data from seabird monitoring in Alaska, I computed the number of years required (with annual censusing) to detect trends of -1.4% per year (50% decline in 50 years) and -2.7% per year (50% decline in 25 years). At ??=0.05 and a desired power of 0.9, estimated study intervals ranged from 11 to 69 years depending on species, trend, software, and study design. Power to detect a negative trend of 6.7% per year (50% decline in 10 years) is suggested as an alternative standard for seabird monitoring that achieves a reasonable match between statistical and biological significance.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Senn, Stephen; Graf, Erika; Caputo, Angelika
2007-12-30
Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Toxicity prediction of ionic liquids based on Daphnia magna by using density functional theory
NASA Astrophysics Data System (ADS)
Nu’aim, M. N.; Bustam, M. A.
2018-04-01
By using a model called density functional theory, the toxicity of ionic liquids can be predicted and forecast. It is a theory that allowing the researcher to have a substantial tool for computation of the quantum state of atoms, molecules and solids, and molecular dynamics which also known as computer simulation method. It can be done by using structural feature based quantum chemical reactivity descriptor. The identification of ionic liquids and its Log[EC50] data are from literature data that available in Ismail Hossain thesis entitled “Synthesis, Characterization and Quantitative Structure Toxicity Relationship of Imidazolium, Pyridinium and Ammonium Based Ionic Liquids”. Each cation and anion of the ionic liquids were optimized and calculated. The geometry optimization and calculation from the software, produce the value of highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO). From the value of HOMO and LUMO, the value for other toxicity descriptors were obtained according to their formulas. The toxicity descriptor that involves are electrophilicity index, HOMO, LUMO, energy gap, chemical potential, hardness and electronegativity. The interrelation between the descriptors are being determined by using a multiple linear regression (MLR). From this MLR, all descriptors being analyzed and the descriptors that are significant were chosen. In order to develop the finest model equation for toxicity prediction of ionic liquids, the selected descriptors that are significant were used. The validation of model equation was performed with the Log[EC50] data from the literature and the final model equation was developed. A bigger range of ionic liquids which nearly 108 of ionic liquids can be predicted from this model equation.
ERIC Educational Resources Information Center
von Davier, Matthias
2014-01-01
Diagnostic models combine multiple binary latent variables in an attempt to produce a latent structure that provides more information about test takers' performance than do unidimensional latent variable models. Recent developments in diagnostic modeling emphasize the possibility that multiple skills may interact in a conjunctive way within the…
Lundblad, Runar; Abdelnoor, Michel; Svennevig, Jan Ludvig
2004-09-01
Simple linear resection and endoventricular patch plasty are alternative techniques to repair postinfarction left ventricular aneurysm. The aim of the study was to compare these 2 methods with regard to early mortality and long-term survival. We retrospectively reviewed 159 patients undergoing operations between 1989 and 2003. The epidemiologic design was of an exposed (simple linear repair, n = 74) versus nonexposed (endoventricular patch plasty, n = 85) cohort with 2 endpoints: early mortality and long-term survival. The crude effect of aneurysm repair technique versus endpoint was estimated by odds ratio, rate ratio, or relative risk and their 95% confidence intervals. Stratification analysis by using the Mantel-Haenszel method was done to quantify confounders and pinpoint effect modifiers. Adjustment for multiconfounders was performed by using logistic regression and Cox regression analysis. Survival curves were analyzed with the Breslow test and the log-rank test. Early mortality was 8.2% for all patients, 13.5% after linear repair and 3.5% after endoventricular patch plasty. When adjusted for multiconfounders, the risk of early mortality was significantly higher after simple linear repair than after endoventricular patch plasty (odds ratio, 4.4; 95% confidence interval, 1.1-17.8). Mean follow-up was 5.8 +/- 3.8 years (range, 0-14.0 years). Overall 5-year cumulative survival was 78%, 70.1% after linear repair and 91.4% after endoventricular patch plasty. The risk of total mortality was significantly higher after linear repair than after endoventricular patch plasty when controlled for multiconfounders (relative risk, 4.5; 95% confidence interval, 2.0-9.7). Linear repair dominated early in the series and patch plasty dominated later, giving a possible learning-curve bias in favor of patch plasty that could not be adjusted for in the regression analysis. Postinfarction left ventricular aneurysm can be repaired with satisfactory early and late results. Surgical risk was lower and long-term survival was higher after endoventricular patch plasty than simple linear repair. Differences in outcome should be interpreted with care because of the retrospective study design and the chronology of the 2 repair methods.
Cho, In-Jeong; Chang, Hyuk-Jae; Heo, Ran; Kim, In-Cheol; Sung, Ji Min; Chang, Byung-Chul; Shim, Chi Young; Hong, Geu-Ru; Chung, Namsik
2017-01-01
Substantial aortic calcification is known to be associated with aortic stiffening and subsequent left ventricular (LV) hypertrophy. This study examined whether the thoracic aorta calcium score (TACS) is related to LV hypertrophy and whether it leads to an adverse prognosis in patients with severe aortic stenosis (AS) after aortic valve replacement (AVR). We retrospectively reviewed 47 patients (mean age, 64 ± 11 years) with isolated severe AS who underwent noncontrast computed tomography of the entire thoracic aorta and who received AVR. TACS was quantified using the volume method with values becoming log transformed ( log [TACS+1]). Transthoracic echocardiography was performed before and 1 year after the operation. Preoperative LV mass index (LVMI) displayed significant positive correlations with male gender (r = 0.430, p = 0.010) and log (TACS+1) (r = 0.556, p = 0.003). In multivariate linear regression analysis, only log (TACS+1) was independently associated with LVMI, even after adjusting for age, gender, transaortic mean pressure gradient, and coronary or valve calcium score. Independent determinants for postoperative LVMI included log (TACS+1) and preoperative LVMI after 1 year of follow-up echocardiography, adjusting for age, gender, indexed effective orifice area, and coronary or valve calcium score. During a median follow-up period of 54 months after AVR, there were 10 events (21%), which included 4 deaths from all-causes, 3 strokes, 2 inpatient admissions for heart failure, and 1 myocardial infarction. The event-free survival rate was significantly lower for patients with TACS of 2,257 mm 3 or higher compared with those whose TACS was lower than 2,257 mm 3 (log-rank p < 0.001). High TACS was associated with increased LVMI among patients with severe AS. Further, high TACS usefully predicted less regression of LVMI and poor clinical outcomes after AVR. TACS may serve as a useful proxy for predicting LV remodeling and adverse prognosis in patients with severe AS undergoing AVR. Copyright © 2017 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Thelin, E P; Zibung, E; Riddez, L; Nordenvall, C
2016-10-01
Worldwide, the use of bicycles, for both recreation and commuting, is increasing. S100B, a suggested protein biomarker for cerebral injury, has been shown to correlate to extracranial injury as well. Using serum levels of S100B, we aimed to investigate how S100B could be used when assessing injuries in patients suffering from bicycle trauma injury. As a secondary aim, we investigated how hospital length of stay and injury severity score (ISS) were correlated to S100B levels. We performed a retrospective, database study including all patients admitted for bicycle trauma to a level 1 trauma center over a four-year period with admission samples of S100B (n = 127). Computerized tomography (CT) scans were reviewed and remaining data were collected from case records. Univariate- and multivariate regression analyses, linear regressions and comparative statistics (Mann-Whitney) were used where appropriate. Both intra- and extracranial injuries were correlated with S100B levels. Stockholm CT score presented the best correlation of an intracranial parameter with S100B levels (p < 0.0001), while the presences of extremity injury, thoracic injury, and non-cervical spinal injury were also significantly correlated (all p < 0.0001, respectively). A multivariate linear regression revealed that Stockholm CT score, non-cervical spinal injury, and abdominal injury all independently correlated with levels of S100B. Patients with a ISS > 15 had higher S100 levels than patients with ISS < 16 (p < 0.0001). Patients with extracranial, as well as intracranial- and extracranial injuries, had significantly higher levels of S100B than patients without injuries (p < 0.05 and p < 0.01, respectively). The admission serum levels of S100B (log, µg/L) were correlated with ISS (log) (r = 0.53) and length of stay (log, days) (r = 0.45). S100B levels were independently correlated with intracranial pathology, but also with the extent of extracranial injury. Length of stay and ISS were both correlated with the admission levels of S100B in bicycle trauma, suggesting S100B to be a good marker of aggregated injury severity. Further studies are warranted to confirm our findings.
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.