regression modelling identified: Topics by Science.gov

Sample records for regression modelling identified

A simple approach to power and sample size calculations in logistic regression and Cox regression models.

PubMed

Vaeth, Michael; Skovlund, Eva

2004-06-15

For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison.

PubMed

Lei, Yang; Nollen, Nikki; Ahluwahlia, Jasjit S; Yu, Qing; Mayo, Matthew S

2015-04-09

Other forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers. The data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users. The logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification rate was 0.342 for the training sample and 0.346 for the validation sample. The CART model was easier to interpret and discovered target populations that possess clinical significance. This study suggests that the non-parametric CART model is parsimonious, potentially easier to interpret, and provides additional information in identifying the subgroups at high risk of ATP use among cigarette smokers.
Evaluating the utility of companion animal tick surveillance practices for monitoring spread and occurrence of human Lyme disease in West Virginia, 2014-2016.

PubMed

Hendricks, Brian; Mark-Carew, Miguella; Conley, Jamison

2017-11-13

Domestic dogs and cats are potentially effective sentinel populations for monitoring occurrence and spread of Lyme disease. Few studies have evaluated the public health utility of sentinel programmes using geo-analytic approaches. Confirmed Lyme disease cases diagnosed by physicians and ticks submitted by veterinarians to the West Virginia State Health Department were obtained for 2014-2016. Ticks were identified to species, and only Ixodes scapularis were incorporated in the analysis. Separate ordinary least squares (OLS) and spatial lag regression models were conducted to estimate the association between average numbers of Ix. scapularis collected on pets and human Lyme disease incidence. Regression residuals were visualised using Local Moran's I as a diagnostic tool to identify spatial dependence. Statistically significant associations were identified between average numbers of Ix. scapularis collected from dogs and human Lyme disease in the OLS (β=20.7, P<0.001) and spatial lag (β=12.0, P=0.002) regression. No significant associations were identified for cats in either regression model. Statistically significant (P≤0.05) spatial dependence was identified in all regression models. Local Moran's I maps produced for spatial lag regression residuals indicated a decrease in model over- and under-estimation, but identified a higher number of statistically significant outliers than OLS regression. Results support previous conclusions that dogs are effective sentinel populations for monitoring risk of human exposure to Lyme disease. Findings reinforce the utility of spatial analysis of surveillance data, and highlight West Virginia's unique position within the eastern United States in regards to Lyme disease occurrence.
Modeling and forecasting US presidential election using learning algorithms

NASA Astrophysics Data System (ADS)

Zolghadr, Mohammad; Niaki, Seyed Armin Akhavan; Niaki, S. T. A.

2017-09-01

The primary objective of this research is to obtain an accurate forecasting model for the US presidential election. To identify a reliable model, artificial neural networks (ANN) and support vector regression (SVR) models are compared based on some specified performance measures. Moreover, six independent variables such as GDP, unemployment rate, the president's approval rate, and others are considered in a stepwise regression to identify significant variables. The president's approval rate is identified as the most significant variable, based on which eight other variables are identified and considered in the model development. Preprocessing methods are applied to prepare the data for the learning algorithms. The proposed procedure significantly increases the accuracy of the model by 50%. The learning algorithms (ANN and SVR) proved to be superior to linear regression based on each method's calculated performance measures. The SVR model is identified as the most accurate model among the other models as this model successfully predicted the outcome of the election in the last three elections (2004, 2008, and 2012). The proposed approach significantly increases the accuracy of the forecast.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics

DTIC Science & Technology

2015-06-01

develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Poisson Mixture Regression Models for Heart Disease Prediction.

PubMed

Mufudza, Chipo; Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction

PubMed Central

Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Regression Models for Identifying Noise Sources in Magnetic Resonance Images

PubMed Central

Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

2009-01-01

Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
Regression: The Apple Does Not Fall Far From the Tree.

PubMed

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Detecting influential observations in nonlinear regression modeling of groundwater flow

USGS Publications Warehouse

Yager, Richard M.

1998-01-01

Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo

NASA Astrophysics Data System (ADS)

Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik

2016-03-01

A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

PubMed

Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

2015-11-04

Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
An adaptive two-stage analog/regression model for probabilistic prediction of small-scale precipitation in France

NASA Astrophysics Data System (ADS)

Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

2018-01-01

Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Multivariate generalized hidden Markov regression models with random covariates: Physical exercise in an elderly population.

PubMed

Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello

2018-04-22

A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

NASA Astrophysics Data System (ADS)

Zhang, Ying; Bi, Peng; Hiller, Janet

2008-01-01

This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
A Linear Regression Model Identifying the Primary Factors Contributing to Maintenance Man Hours for the C-17 Globemaster III in the Air National Guard

DTIC Science & Technology

2012-06-15

Maintenance AFSCs ................................................................................................. 14 2. Variation Inflation Factors...total variability in the data. It is an indication of how much of the 20 variation in the data can be accounted for in the regression model. In... Variation Inflation Factors for each independent variable (predictor) as regressed against all of the other independent variables in the model. The
Detection of outliers in the response and explanatory variables of the simple circular regression model

NASA Astrophysics Data System (ADS)

Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

2016-06-01

The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
A new multiple regression model to identify multi-family houses with a high prevalence of sick building symptoms "SBS", within the healthy sustainable house study in Stockholm (3H).

PubMed

Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G

2010-01-01

The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.
Moderation analysis using a two-level regression model.

PubMed

Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

2014-10-01

Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model

PubMed Central

Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei

2015-01-01

High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134

Predicting recreational water quality advisories: A comparison of statistical methods

USGS Publications Warehouse

Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

2016-01-01

Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants

Treesearch

T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu

2011-01-01

Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

ERIC Educational Resources Information Center

Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

2014-01-01

This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Regression Analysis of Physician Distribution to Identify Areas of Need: Some Preliminary Findings.

ERIC Educational Resources Information Center

Morgan, Bruce B.; And Others

A regression analysis was conducted of factors that help to explain the variance in physician distribution and which identify those factors that influence the maldistribution of physicians. Models were developed for different geographic areas to determine the most appropriate unit of analysis for the Western Missouri Area Health Education Center…
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert Manfred; Volden, Thomas R.

2010-01-01

The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

ERIC Educational Resources Information Center

Camilleri, Liberato; Cefai, Carmel

2013-01-01

Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Applying quantile regression for modeling equivalent property damage only crashes to identify accident blackspots.

PubMed

Washington, Simon; Haque, Md Mazharul; Oh, Jutaek; Lee, Dongmin

2014-05-01

Hot spot identification (HSID) aims to identify potential sites-roadway segments, intersections, crosswalks, interchanges, ramps, etc.-with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing with preponderance of zeros problem or right skewed dataset. Copyright © 2014 Elsevier Ltd. All rights reserved.
Variable selection and model choice in geoadditive regression models.

PubMed

Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

2009-06-01

Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Two models for identification and predicting behaviour of an induction motor system

NASA Astrophysics Data System (ADS)

Kuo, Chien-Hsun

2018-01-01

System identification or modelling is the process of building mathematical models of dynamical systems based on the available input and output data from the systems. This paper introduces system identification by using ARX (Auto Regressive with eXogeneous input) and ARMAX (Auto Regressive Moving Average with eXogeneous input) models. Through the identified system model, the predicted output could be compared with the measured one to help prevent the motor faults from developing into a catastrophic machine failure and avoid unnecessary costs and delays caused by the need to carry out unscheduled repairs. The induction motor system is illustrated as an example. Numerical and experimental results are shown for the identified induction motor system.
Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

PubMed

Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

2017-01-01

This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Retargeted Least Squares Regression Algorithm.

PubMed

Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin

2015-09-01

This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
Multiple-Instance Regression with Structured Data

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

2008-01-01

We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Using exploratory regression to identify optimal driving factors for cellular automaton modeling of land use change.

PubMed

Feng, Yongjiu; Tong, Xiaohua

2017-09-22

Defining transition rules is an important issue in cellular automaton (CA)-based land use modeling because these models incorporate highly correlated driving factors. Multicollinearity among correlated driving factors may produce negative effects that must be eliminated from the modeling. Using exploratory regression under pre-defined criteria, we identified all possible combinations of factors from the candidate factors affecting land use change. Three combinations that incorporate five driving factors meeting pre-defined criteria were assessed. With the selected combinations of factors, three logistic regression-based CA models were built to simulate dynamic land use change in Shanghai, China, from 2000 to 2015. For comparative purposes, a CA model with all candidate factors was also applied to simulate the land use change. Simulations using three CA models with multicollinearity eliminated performed better (with accuracy improvements about 3.6%) than the model incorporating all candidate factors. Our results showed that not all candidate factors are necessary for accurate CA modeling and the simulations were not sensitive to changes in statistically non-significant driving factors. We conclude that exploratory regression is an effective method to search for the optimal combinations of driving factors, leading to better land use change models that are devoid of multicollinearity. We suggest identification of dominant factors and elimination of multicollinearity before building land change models, making it possible to simulate more realistic outcomes.
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

PubMed Central

Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi

2013-01-01

Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382
Population heterogeneity in the salience of multiple risk factors for adolescent delinquency.

PubMed

Lanza, Stephanie T; Cooper, Brittany R; Bray, Bethany C

2014-03-01

To present mixture regression analysis as an alternative to more standard regression analysis for predicting adolescent delinquency. We demonstrate how mixture regression analysis allows for the identification of population subgroups defined by the salience of multiple risk factors. We identified population subgroups (i.e., latent classes) of individuals based on their coefficients in a regression model predicting adolescent delinquency from eight previously established risk indices drawn from the community, school, family, peer, and individual levels. The study included N = 37,763 10th-grade adolescents who participated in the Communities That Care Youth Survey. Standard, zero-inflated, and mixture Poisson and negative binomial regression models were considered. Standard and mixture negative binomial regression models were selected as optimal. The five-class regression model was interpreted based on the class-specific regression coefficients, indicating that risk factors had varying salience across classes of adolescents. Standard regression showed that all risk factors were significantly associated with delinquency. Mixture regression provided more nuanced information, suggesting a unique set of risk factors that were salient for different subgroups of adolescents. Implications for the design of subgroup-specific interventions are discussed. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Testing hypotheses for differences between linear regression lines

Treesearch

Stanley J. Zarnoch

2009-01-01

Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

PubMed

Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

2011-01-01

Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Subgroup identification of early preterm birth (ePTB): informing a future prospective enrichment clinical trial design.

PubMed

Zhang, Chuanwu; Garrard, Lili; Keighley, John; Carlson, Susan; Gajewski, Byron

2017-01-10

Despite the widely recognized association between the severity of early preterm birth (ePTB) and its related severe diseases, little is known about the potential risk factors of ePTB and the sub-population with high risk of ePTB. Moreover, motivated by a future confirmatory clinical trial to identify whether supplementing pregnant women with docosahexaenoic acid (DHA) has a different effect on the risk subgroup population or not in terms of ePTB prevalence, this study aims to identify potential risk subgroups and risk factors for ePTB, defined as babies born less than 34 weeks of gestation. The analysis data (N = 3,994,872) were obtained from CDC and NCHS' 2014 Natality public data file. The sample was split into independent training and validation cohorts for model generation and model assessment, respectively. Logistic regression and CART models were used to examine potential ePTB risk predictors and their interactions, including mothers' age, nativity, race, Hispanic origin, marital status, education, pre-pregnancy smoking status, pre-pregnancy BMI, pre-pregnancy diabetes status, pre-pregnancy hypertension status, previous preterm birth status, infertility treatment usage status, fertility enhancing drug usage status, and delivery payment source. Both logistic regression models with either 14 or 10 ePTB risk factors produced the same C-index (0.646) based on the training cohort. The C-index of the logistic regression model based on 10 predictors was 0.645 for the validation cohort. Both C-indexes indicated a good discrimination and acceptable model fit. The CART model identified preterm birth history and race as the most important risk factors, and revealed that the subgroup with a preterm birth history and a race designation as Black had the highest risk for ePTB. The c-index and misclassification rate were 0.579 and 0.034 for the training cohort, and 0.578 and 0.034 for the validation cohort, respectively. This study revealed 14 maternal characteristic variables that reliably identified risk for ePTB through either logistic regression model and/or a CART model. Moreover, both models efficiently identify risk subgroups for further enrichment clinical trial design.
Partial Least Squares Regression Models for the Analysis of Kinase Signaling.

PubMed

Bourgeois, Danielle L; Kreeger, Pamela K

2017-01-01

Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.
Modeling brook trout presence and absence from landscape variables using four different analytical methods

USGS Publications Warehouse

Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.

2006-01-01

As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.

Identification of extremely premature infants at high risk of rehospitalization.

PubMed

Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

2011-11-01

Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization

PubMed Central

Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

2011-01-01

OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
Using an autologistic regression model to identify spatial risk factors and spatial risk patterns of hand, foot and mouth disease (HFMD) in Mainland China

PubMed Central

2014-01-01

Background There have been large-scale outbreaks of hand, foot and mouth disease (HFMD) in Mainland China over the last decade. These events varied greatly across the country. It is necessary to identify the spatial risk factors and spatial distribution patterns of HFMD for public health control and prevention. Climate risk factors associated with HFMD occurrence have been recognized. However, few studies discussed the socio-economic determinants of HFMD risk at a space scale. Methods HFMD records in Mainland China in May 2008 were collected. Both climate and socio-economic factors were selected as potential risk exposures of HFMD. Odds ratio (OR) was used to identify the spatial risk factors. A spatial autologistic regression model was employed to get OR values of each exposures and model the spatial distribution patterns of HFMD risk. Results Results showed that both climate and socio-economic variables were spatial risk factors for HFMD transmission in Mainland China. The statistically significant risk factors are monthly average precipitation (OR = 1.4354), monthly average temperature (OR = 1.379), monthly average wind speed (OR = 1.186), the number of industrial enterprises above designated size (OR = 17.699), the population density (OR = 1.953), and the proportion of student population (OR = 1.286). The spatial autologistic regression model has a good goodness of fit (ROC = 0.817) and prediction accuracy (Correct ratio = 78.45%) of HFMD occurrence. The autologistic regression model also reduces the contribution of the residual term in the ordinary logistic regression model significantly, from 17.25 to 1.25 for the odds ratio. Based on the prediction results of the spatial model, we obtained a map of the probability of HFMD occurrence that shows the spatial distribution pattern and local epidemic risk over Mainland China. Conclusions The autologistic regression model was used to identify spatial risk factors and model spatial risk patterns of HFMD. HFMD occurrences were found to be spatially heterogeneous over the Mainland China, which is related to both the climate and socio-economic variables. The combination of socio-economic and climate exposures can explain the HFMD occurrences more comprehensively and objectively than those with only climate exposures. The modeled probability of HFMD occurrence at the county level reveals not only the spatial trends, but also the local details of epidemic risk, even in the regions where there were no HFMD case records. PMID:24731248
Animal models of maternal high fat diet exposure and effects on metabolism in offspring: a meta-regression analysis.

PubMed

Ribaroff, G A; Wastnedge, E; Drake, A J; Sharpe, R M; Chambers, T J G

2017-06-01

Animal models of maternal high fat diet (HFD) demonstrate perturbed offspring metabolism although the effects differ markedly between models. We assessed studies investigating metabolic parameters in the offspring of HFD fed mothers to identify factors explaining these inter-study differences. A total of 171 papers were identified, which provided data from 6047 offspring. Data were extracted regarding body weight, adiposity, glucose homeostasis and lipidaemia. Information regarding the macronutrient content of diet, species, time point of exposure and gestational weight gain were collected and utilized in meta-regression models to explore predictive factors. Publication bias was assessed using Egger's regression test. Maternal HFD exposure did not affect offspring birthweight but increased weaning weight, final bodyweight, adiposity, triglyceridaemia, cholesterolaemia and insulinaemia in both female and male offspring. Hyperglycaemia was found in female offspring only. Meta-regression analysis identified lactational HFD exposure as a key moderator. The fat content of the diet did not correlate with any outcomes. There was evidence of significant publication bias for all outcomes except birthweight. Maternal HFD exposure was associated with perturbed metabolism in offspring but between studies was not accounted for by dietary constituents, species, strain or maternal gestational weight gain. Specific weaknesses in experimental design predispose many of the results to bias. © 2017 The Authors. Obesity Reviews published by John Wiley & Sons Ltd on behalf of World Obesity Federation.
Quantitative monitoring of sucrose, reducing sugar and total sugar dynamics for phenotyping of water-deficit stress tolerance in rice through spectroscopy and chemometrics

NASA Astrophysics Data System (ADS)

Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini

2018-03-01

In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.
Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu

Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic carcinogenesis (C) were studied by toxicogenomics. • Important genes for H and C were selected by logistic ridge regression analysis. • Amino acid biosynthesis and oxidative responses may be involved in C. • Predictive models for H and C provided 94.8% and 82.7% accuracy, respectively. • The identified genes could be useful for assessment of liver hypertrophy.« less
Modeling Success: Using Preenrollment Data to Identify Academically At-Risk Students

ERIC Educational Resources Information Center

Gansemer-Topf, Ann M.; Compton, Jonathan; Wohlgemuth, Darin; Forbes, Greg; Ralston, Ekaterina

2015-01-01

Improving student success and degree completion is one of the core principles of strategic enrollment management. To address this principle, institutional data were used to develop a statistical model to identify academically at-risk students. The model employs multiple linear regression techniques to predict students at risk of earning below a…
Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

PubMed

Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Buying a Better Air Force

DTIC Science & Technology

2006-03-01

identify if an explanatory variable may have been omitted due to model misspecification ( Ramsey , 1979). The RESET test resulted in failure to...Prob > F 0.0094 This model was also regressed using Huber-White estimators. Again, the Ramsey RESET test was done to ensure relevant...Aircraft. Annapolis, MD: Naval Institute Press, 2004. Ramsey , J. B. “ Tests for Specification Errors in Classical Least-Squares Regression Analysis
Easy and low-cost identification of metabolic syndrome in patients treated with second-generation antipsychotics: artificial neural network and logistic regression models.

PubMed

Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan

2010-03-01

Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
The creation and evaluation of a model to simulate the probability of conception in seasonal-calving pasture-based dairy heifers.

PubMed

Fenlon, Caroline; O'Grady, Luke; Butler, Stephen; Doherty, Michael L; Dunnion, John

2017-01-01

Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thorough evaluation. Artificial Insemination service records from two research herds and ten commercial herds were provided to build and evaluate the models. All were managed as spring-calving pasture-based systems. The factors studied were related to age, genetics, and time of service. The data were split into training and testing sets and bootstrapping was used to train the models. Logistic regression (with and without random effects) and generalised additive modelling were selected as the model-building techniques. Two types of evaluation were used to test the predictive ability of the models: discrimination and calibration. Discrimination, which includes sensitivity, specificity, accuracy and ROC analysis, measures a model's ability to distinguish between positive and negative outcomes. Calibration measures the accuracy of the predicted probabilities with the Hosmer-Lemeshow goodness-of-fit, calibration plot and calibration error. After data cleaning and the removal of services with missing values, 1396 services remained to train the models and 597 were left for testing. Age, breed, genetic predicted transmitting ability for calving interval, month and year were significant in the multivariate models. The regression models also included an interaction between age and month. Year within herd was a random effect in the mixed regression model. Overall prediction accuracy was between 77.1% and 78.9%. All three models had very high sensitivity, but low specificity. The two regression models were very well-calibrated. The mean absolute calibration errors were all below 4%. Because the models were not adept at identifying unsuccessful services, they are not suggested for use in predicting the outcome of individual heifer services. Instead, they are useful for the comparison of services with different covariate values or as sub-models in whole-farm simulations. The mixed regression model was identified as the best model for prediction, as the random effects can be ignored and the other variables can be easily obtained or simulated.
pLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies.

PubMed

Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M

2017-06-01

Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity

PubMed Central

Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

2012-01-01

While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655
Tools to support interpreting multiple regression in the face of multicollinearity.

PubMed

Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

2012-01-01

While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Cumulative Risk and Impact Modeling on Environmental Chemical and Social Stressors.

PubMed

Huang, Hongtai; Wang, Aolin; Morello-Frosch, Rachel; Lam, Juleen; Sirota, Marina; Padula, Amy; Woodruff, Tracey J

2018-03-01

The goal of this review is to identify cumulative modeling methods used to evaluate combined effects of exposures to environmental chemicals and social stressors. The specific review question is: What are the existing quantitative methods used to examine the cumulative impacts of exposures to environmental chemical and social stressors on health? There has been an increase in literature that evaluates combined effects of exposures to environmental chemicals and social stressors on health using regression models; very few studies applied other data mining and machine learning techniques to this problem. The majority of studies we identified used regression models to evaluate combined effects of multiple environmental and social stressors. With proper study design and appropriate modeling assumptions, additional data mining methods may be useful to examine combined effects of environmental and social stressors.
A kinetic energy model of two-vehicle crash injury severity.

PubMed

Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh

2011-05-01

An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.
glmnetLRC f/k/a lrc package: Logistic Regression Classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-06-09

Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less
Prediction of dynamical systems by symbolic regression

NASA Astrophysics Data System (ADS)

Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.

2016-07-01

We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?

PubMed

Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M

2018-05-01

Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Spatio-temporal water quality mapping from satellite images using geographically and temporally weighted regression

NASA Astrophysics Data System (ADS)

Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua

2018-03-01

The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.

Machine learning and linear regression models to predict catchment-level base cation weathering rates across the southern Appalachian Mountain region, USA

Treesearch

Nicholas A. Povak; Paul F. Hessburg; Todd C. McDonnell; Keith M. Reynolds; Timothy J. Sullivan; R. Brion Salter; Bernard J. Crosby

2014-01-01

Accurate estimates of soil mineral weathering are required for regional critical load (CL) modeling to identify ecosystems at risk of the deleterious effects from acidification. Within a correlative modeling framework, we used modeled catchment-level base cation weathering (BCw) as the response variable to identify key environmental correlates and predict a continuous...
Functional CAR models for large spatially correlated functional datasets.

PubMed

Zhang, Lin; Baladandayuthapani, Veerabhadran; Zhu, Hongxiao; Baggerly, Keith A; Majewski, Tadeusz; Czerniak, Bogdan A; Morris, Jeffrey S

2016-01-01

We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

PubMed Central

Hüls, Anke; Frömke, Cornelia; Ickstadt, Katja; Hille, Katja; Hering, Johanna; von Münchhausen, Christiane; Hartmann, Maria; Kreienbrock, Lothar

2017-01-01

Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i) to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model) and (ii) to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate model. PMID:28620609
Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

PubMed

Levine, Matthew E; Albers, David J; Hripcsak, George

2016-01-01

Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models.

PubMed

Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi

2017-03-01

Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Identification of immune correlates of protection in Shigella infection by application of machine learning.

PubMed

Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

2017-10-01

Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
Properties of added variable plots in Cox's regression model.

PubMed

Lindkvist, M

2000-03-01

The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics.
A predictive model to allocate frequent service users of community-based mental health services to different packages of care.

PubMed

Grigoletti, Laura; Amaddeo, Francesco; Grassi, Aldrigo; Boldrini, Massimo; Chiappelli, Marco; Percudani, Mauro; Catapano, Francesco; Fiorillo, Andrea; Perris, Francesco; Bacigalupi, Maurizio; Albanese, Paolo; Simonetti, Simona; De Agostini, Paola; Tansella, Michele

2010-01-01

To develop predictive models to allocate patients into frequent and low service users groups within the Italian Community-based Mental Health Services (CMHSs). To allocate frequent users to different packages of care, identifying the costs of these packages. Socio-demographic and clinical data and GAF scores at baseline were collected for 1250 users attending five CMHSs. All psychiatric contacts made by these patients during six months were recorded. A logistic regression identified frequent service users predictive variables. Multinomial logistic regression identified variables able to predict the most appropriate package of care. A cost function was utilised to estimate costs. Frequent service users were 49%, using nearly 90% of all contacts. The model classified correctly 80% of users in the frequent and low users groups. Three packages of care were identified: Basic Community Treatment (4,133 Euro per six months); Intensive Community Treatment (6,180 Euro) and Rehabilitative Community Treatment (11,984 Euro) for 83%, 6% and 11% of frequent service users respectively. The model was found to be accurate for 85% of users. It is possible to develop predictive models to identify frequent service users and to assign them to pre-defined packages of care, and to use these models to inform the funding of psychiatric care.
Spatial Statistical Network Models for Stream and River Temperature in the Chesapeake Bay Watershed, USA

EPA Science Inventory

Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...
Finding gene clusters for a replicated time course study

PubMed Central

2014-01-01

Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Detection of epistatic effects with logic regression and a classical linear regression model.

PubMed

Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

2014-02-01

To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.

PubMed

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
An investigation to improve the Menhaden fishery prediction and detection model through the application of ERTS-A data

NASA Technical Reports Server (NTRS)

Maughan, P. M. (Principal Investigator)

1973-01-01

The author has identified the following significant results. Linear regression of secchi disc visibility against number of sets yielded significant results in a number of instances. The variability seen in the slope of the regression lines is due to the nonuniformity of sample size. The longer the period sampled, the larger the total number of attempts. Further, there is no reason to expect either the influence of transparency or of other variables to remain constant throughout the season. However, the fact that the data for the entire season, variable as it is, was significant at the 5% level, suggests its potential utility for predictive modeling. Thus, this regression equation will be considered representative and will be utilized for the first numerical model. Secchi disc visibility was also regressed against number of sets for the three day period September 27-September 29, 1972 to determine if surface truth data supported the intense relationship between ERTS-1 identified turbidity and fishing effort previously discussed. A very negative correlation was found. These relationship lend additional credence to the hypothesis that ERTS imagery, when utilized as a source of visibility (turbidity) data, may be useful as a predictive tool.
Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

PubMed

Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

2014-01-01

The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

PubMed Central

Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.

2014-01-01

Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601
Construction of a pathological risk model of occult lymph node metastases for prognostication by semi-automated image analysis of tumor budding in early-stage oral squamous cell carcinoma

PubMed Central

Pedersen, Nicklas Juel; Jensen, David Hebbelstrup; Lelkaitis, Giedrius; Kiss, Katalin; Charabi, Birgitte; Specht, Lena; von Buchwald, Christian

2017-01-01

It is challenging to identify at diagnosis those patients with early oral squamous cell carcinoma (OSCC), who have a poor prognosis and those that have a high risk of harboring occult lymph node metastases. The aim of this study was to develop a standardized and objective digital scoring method to evaluate the predictive value of tumor budding. We developed a semi-automated image-analysis algorithm, Digital Tumor Bud Count (DTBC), to evaluate tumor budding. The algorithm was tested in 222 consecutive patients with early-stage OSCC and major endpoints were overall (OS) and progression free survival (PFS). We subsequently constructed and cross-validated a binary logistic regression model and evaluated its clinical utility by decision curve analysis. A high DTBC was an independent predictor of both poor OS and PFS in a multivariate Cox regression model. The logistic regression model was able to identify patients with occult lymph node metastases with an area under the curve (AUC) of 0.83 (95% CI: 0.78–0.89, P <0.001) and a 10-fold cross-validated AUC of 0.79. Compared to other known histopathological risk factors, the DTBC had a higher diagnostic accuracy. The proposed, novel risk model could be used as a guide to identify patients who would benefit from an up-front neck dissection. PMID:28212555
Integration of least angle regression with empirical Bayes for multi-locus genome-wide association studies

USDA-ARS?s Scientific Manuscript database

Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical B...
Bootstrap investigation of the stability of a Cox regression model.

PubMed

Altman, D G; Andersen, P K

1989-07-01

We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

PubMed

Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

2015-12-01

This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Building a computer program to support children, parents, and distraction during healthcare procedures.

PubMed

Hanrahan, Kirsten; McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W Nick; Zimmerman, M Bridget; Ersig, Anne L

2012-10-01

This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children's responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, titled Children, Parents and Distraction, is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure.

Penalized regression procedures for variable selection in the potential outcomes framework

PubMed Central

Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

2015-01-01

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
Restoration of Monotonicity Respecting in Dynamic Regression

PubMed Central

Huang, Yijian

2017-01-01

Dynamic regression models, including the quantile regression model and Aalen’s additive hazards model, are widely adopted to investigate evolving covariate effects. Yet lack of monotonicity respecting with standard estimation procedures remains an outstanding issue. Advances have recently been made, but none provides a complete resolution. In this article, we propose a novel adaptive interpolation method to restore monotonicity respecting, by successively identifying and then interpolating nearest monotonicity-respecting points of an original estimator. Under mild regularity conditions, the resulting regression coefficient estimator is shown to be asymptotically equivalent to the original. Our numerical studies have demonstrated that the proposed estimator is much more smooth and may have better finite-sample efficiency than the original as well as, when available as only in special cases, other competing monotonicity-respecting estimators. Illustration with a clinical study is provided. PMID:29430068
Detection of crossover time scales in multifractal detrended fluctuation analysis

NASA Astrophysics Data System (ADS)

Ge, Erjia; Leung, Yee

2013-04-01

Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Model parameter uncertainty analysis for an annual field-scale P loss model

NASA Astrophysics Data System (ADS)

Bolster, Carl H.; Vadas, Peter A.; Boykin, Debbie

2016-08-01

Phosphorous (P) fate and transport models are important tools for developing and evaluating conservation practices aimed at reducing P losses from agricultural fields. Because all models are simplifications of complex systems, there will exist an inherent amount of uncertainty associated with their predictions. It is therefore important that efforts be directed at identifying, quantifying, and communicating the different sources of model uncertainties. In this study, we conducted an uncertainty analysis with the Annual P Loss Estimator (APLE) model. Our analysis included calculating parameter uncertainties and confidence and prediction intervals for five internal regression equations in APLE. We also estimated uncertainties of the model input variables based on values reported in the literature. We then predicted P loss for a suite of fields under different management and climatic conditions while accounting for uncertainties in the model parameters and inputs and compared the relative contributions of these two sources of uncertainty to the overall uncertainty associated with predictions of P loss. Both the overall magnitude of the prediction uncertainties and the relative contributions of the two sources of uncertainty varied depending on management practices and field characteristics. This was due to differences in the number of model input variables and the uncertainties in the regression equations associated with each P loss pathway. Inspection of the uncertainties in the five regression equations brought attention to a previously unrecognized limitation with the equation used to partition surface-applied fertilizer P between leaching and runoff losses. As a result, an alternate equation was identified that provided similar predictions with much less uncertainty. Our results demonstrate how a thorough uncertainty and model residual analysis can be used to identify limitations with a model. Such insight can then be used to guide future data collection and model development and evaluation efforts.
An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

USGS Publications Warehouse

Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

2016-01-01

Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Evaluating predictive models for solar energy growth in the US states and identifying the key drivers

NASA Astrophysics Data System (ADS)

Chakraborty, Joheen; Banerji, Sugata

2018-03-01

Driven by a desire to control climate change and reduce the dependence on fossil fuels, governments around the world are increasing the adoption of renewable energy sources. However, among the US states, we observe a wide disparity in renewable penetration. In this study, we have identified and cleaned over a dozen datasets representing solar energy penetration in each US state, and the potentially relevant socioeconomic and other factors that may be driving the growth in solar. We have applied a number of predictive modeling approaches - including machine learning and regression - on these datasets over a 17-year period and evaluated the relative performance of the models. Our goals were: (1) identify the most important factors that are driving the growth in solar, (2) choose the most effective predictive modeling technique for solar growth, and (3) develop a model for predicting next year’s solar growth using this year’s data. We obtained very promising results with random forests (about 90% efficacy) and varying degrees of success with support vector machines and regression techniques (linear, polynomial, ridge). We also identified states with solar growth slower than expected and representing a potential for stronger growth in future.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

PubMed Central

Wang, Hong; Xu, Qingsong; Zhou, Lifeng

2015-01-01

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results?

PubMed Central

Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee

2016-01-01

Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

ERIC Educational Resources Information Center

Karabatsos, George; Walker, Stephen G.

2013-01-01

The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

PubMed

Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

2014-09-01

Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

PubMed

Madarang, Krish J; Kang, Joo-Hyon

2014-06-01

Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friddle, Carl J; Koga, Teiichiro; Rubin, Edward M.

2000-03-15

While cardiac hypertrophy has been the subject of intensive investigation, regression of hypertrophy has been significantly less studied, precluding large-scale analysis of the relationship between these processes. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points duringmore » cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes or pathways previously known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.« less
4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on Classification Local Lymph Node Assay Measures

PubMed Central

Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.

2008-01-01

Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL

EPA Science Inventory

With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...
Multilevel covariance regression with correlated random effects in the mean and variance structure.

PubMed

Quintero, Adrian; Lesaffre, Emmanuel

2017-09-01

Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks

PubMed Central

Caywood, Matthew S.; Roberts, Daniel M.; Colombe, Jeffrey B.; Greenwald, Hal S.; Weiland, Monica Z.

2017-01-01

There is increasing interest in real-time brain-computer interfaces (BCIs) for the passive monitoring of human cognitive state, including cognitive workload. Too often, however, effective BCIs based on machine learning techniques may function as “black boxes” that are difficult to analyze or interpret. In an effort toward more interpretable BCIs, we studied a family of N-back working memory tasks using a machine learning model, Gaussian Process Regression (GPR), which was both powerful and amenable to analysis. Participants performed the N-back task with three stimulus variants, auditory-verbal, visual-spatial, and visual-numeric, each at three working memory loads. GPR models were trained and tested on EEG data from all three task variants combined, in an effort to identify a model that could be predictive of mental workload demand regardless of stimulus modality. To provide a comparison for GPR performance, a model was additionally trained using multiple linear regression (MLR). The GPR model was effective when trained on individual participant EEG data, resulting in an average standardized mean squared error (sMSE) between true and predicted N-back levels of 0.44. In comparison, the MLR model using the same data resulted in an average sMSE of 0.55. We additionally demonstrate how GPR can be used to identify which EEG features are relevant for prediction of cognitive workload in an individual participant. A fraction of EEG features accounted for the majority of the model’s predictive power; using only the top 25% of features performed nearly as well as using 100% of features. Subsets of features identified by linear models (ANOVA) were not as efficient as subsets identified by GPR. This raises the possibility of BCIs that require fewer model features while capturing all of the information needed to achieve high predictive accuracy. PMID:28123359
Multinomial Logistic Regression Predicted Probability Map To Visualize The Influence Of Socio-Economic Factors On Breast Cancer Occurrence in Southern Karnataka

NASA Astrophysics Data System (ADS)

Madhu, B.; Ashok, N. C.; Balasubramanian, S.

2014-11-01

Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

PubMed Central

Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

2014-01-01

Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792
Job stress models, depressive disorders and work performance of engineers in microelectronics industry.

PubMed

Chen, Sung-Wei; Wang, Po-Chuan; Hsin, Ping-Lung; Oates, Anthony; Sun, I-Wen; Liu, Shen-Ing

2011-01-01

Microelectronic engineers are considered valuable human capital contributing significantly toward economic development, but they may encounter stressful work conditions in the context of a globalized industry. The study aims at identifying risk factors of depressive disorders primarily based on job stress models, the Demand-Control-Support and Effort-Reward Imbalance models, and at evaluating whether depressive disorders impair work performance in microelectronics engineers in Taiwan. The case-control study was conducted among 678 microelectronics engineers, 452 controls and 226 cases with depressive disorders which were defined by a score 17 or more on the Beck Depression Inventory and a psychiatrist's diagnosis. The self-administered questionnaires included the Job Content Questionnaire, Effort-Reward Imbalance Questionnaire, demography, psychosocial factors, health behaviors and work performance. Hierarchical logistic regression was applied to identify risk factors of depressive disorders. Multivariate linear regressions were used to determine factors affecting work performance. By hierarchical logistic regression, risk factors of depressive disorders are high demands, low work social support, high effort/reward ratio and low frequency of physical exercise. Combining the two job stress models may have better predictive power for depressive disorders than adopting either model alone. Three multivariate linear regressions provide similar results indicating that depressive disorders are associated with impaired work performance in terms of absence, role limitation and social functioning limitation. The results may provide insight into the applicability of job stress models in a globalized high-tech industry considerably focused in non-Western countries, and the design of workplace preventive strategies for depressive disorders in Asian electronics engineering population.
A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.

PubMed

Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M

2017-06-01

Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.

Sparse modeling of spatial environmental variables associated with asthma

PubMed Central

Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.

2014-01-01

Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437
Sparse modeling of spatial environmental variables associated with asthma.

PubMed

Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W

2015-02-01

Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.
A novel hybrid method of beta-turn identification in protein using binary logistic regression and neural network

PubMed Central

Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz

2012-01-01

From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
A novel hybrid method of beta-turn identification in protein using binary logistic regression and neural network.

PubMed

Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz

2012-01-01

From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models

NASA Technical Reports Server (NTRS)

Ulbrich, Norbert; Bader, Jon B.

2009-01-01

Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
Building a Computer Program to Support Children, Parents, and Distraction during Healthcare Procedures

PubMed Central

McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W. Nick; Zimmerman, M. Bridget; Ersig, Anne L.

2012-01-01

This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children’s responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, the Children, Parents and Distraction (CPaD), is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure. PMID:22805121
Large biases in regression-based constituent flux estimates: causes and diagnostic tools

USGS Publications Warehouse

Hirsch, Robert M.

2014-01-01

It has been documented in the literature that, in some cases, widely used regression-based models can produce severely biased estimates of long-term mean river fluxes of various constituents. These models, estimated using sample values of concentration, discharge, and date, are used to compute estimated fluxes for a multiyear period at a daily time step. This study compares results of the LOADEST seven-parameter model, LOADEST five-parameter model, and the Weighted Regressions on Time, Discharge, and Season (WRTDS) model using subsampling of six very large datasets to better understand this bias problem. This analysis considers sample datasets for dissolved nitrate and total phosphorus. The results show that LOADEST-7 and LOADEST-5, although they often produce very nearly unbiased results, can produce highly biased results. This study identifies three conditions that can give rise to these severe biases: (1) lack of fit of the log of concentration vs. log discharge relationship, (2) substantial differences in the shape of this relationship across seasons, and (3) severely heteroscedastic residuals. The WRTDS model is more resistant to the bias problem than the LOADEST models but is not immune to them. Understanding the causes of the bias problem is crucial to selecting an appropriate method for flux computations. Diagnostic tools for identifying the potential for bias problems are introduced, and strategies for resolving bias problems are described.
A Bayesian Hierarchical Modeling Approach to Predicting Flow in Ungauged Basins

EPA Science Inventory

Recent innovative approaches to identifying and applying regression-based relationships between land use patterns (such as increasing impervious surface area and decreasing vegetative cover) and rainfall-runoff model parameters represent novel and promising improvements to predic...
Evaluating Spatial Variability in Sediment and Phosphorus Concentration-Discharge Relationships Using Bayesian Inference and Self-Organizing Maps

NASA Astrophysics Data System (ADS)

Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.

2017-12-01

Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.
A controlled experiment in ground water flow model calibration

USGS Publications Warehouse

Hill, M.C.; Cooley, R.L.; Pollock, D.W.

1998-01-01

Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.
Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model.

PubMed

Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan

2016-10-01

Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Use of Cox's Cure Model to Establish Clinical Determinants of Long-Term Disease-Free Survival in Neoadjuvant-Chemotherapy-Treated Breast Cancer Patients without Pathologic Complete Response.

PubMed

Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma; Yonemori, Kan; Hirata, Taizo; Shimizu, Chikako; Tamura, Kenji; Fujiwara, Yasuhiro

2013-01-01

In prognostic studies for breast cancer patients treated with neoadjuvant chemotherapy (NAC), the ordinary Cox proportional-hazards (PH) model has been often used to identify prognostic factors for disease-free survival (DFS). This model assumes that all patients eventually experience relapse or death. However, a subset of NAC-treated breast cancer patients never experience these events during long-term follow-up (>10 years) and may be considered clinically "cured." Clinical factors associated with cure have not been studied adequately. Because the ordinary Cox PH model cannot be used to identify such clinical factors, we used the Cox PH cure model, a recently developed statistical method. This model includes both a logistic regression component for the cure rate and a Cox regression component for the hazard for uncured patients. The purpose of this study was to identify the clinical factors associated with cure and the variables associated with the time to recurrence or death in NAC-treated breast cancer patients without a pathologic complete response, by using the Cox PH cure model. We found that hormone receptor status, clinical response, human epidermal growth factor receptor 2 status, histological grade, and the number of lymph node metastases were associated with cure.
Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

PubMed

Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

2017-02-01

At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

NASA Astrophysics Data System (ADS)

Gillis, Nicolas; Luce, Robert

2018-01-01

A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression.

PubMed

Chen, Qiang; Mei, Kun; Dahlgren, Randy A; Wang, Ting; Gong, Jian; Zhang, Minghua

2016-12-01

As an important regulator of pollutants in overland flow and interflow, land use has become an essential research component for determining the relationships between surface water quality and pollution sources. This study investigated the use of ordinary least squares (OLS) and geographically weighted regression (GWR) models to identify the impact of land use and population density on surface water quality in the Wen-Rui Tang River watershed of eastern China. A manual variable excluding-selecting method was explored to resolve multicollinearity issues. Standard regression coefficient analysis coupled with cluster analysis was introduced to determine which variable had the greatest influence on water quality. Results showed that: (1) Impact of land use on water quality varied with spatial and seasonal scales. Both positive and negative effects for certain land-use indicators were found in different subcatchments. (2) Urban land was the dominant factor influencing N, P and chemical oxygen demand (COD) in highly urbanized regions, but the relationship was weak as the pollutants were mainly from point sources. Agricultural land was the primary factor influencing N and P in suburban and rural areas; the relationship was strong as the pollutants were mainly from agricultural surface runoff. Subcatchments located in suburban areas were identified with urban land as the primary influencing factor during the wet season while agricultural land was identified as a more prevalent influencing factor during the dry season. (3) Adjusted R 2 values in OLS models using the manual variable excluding-selecting method averaged 14.3% higher than using stepwise multiple linear regressions. However, the corresponding GWR models had adjusted R 2 ~59.2% higher than the optimal OLS models, confirming that GWR models demonstrated better prediction accuracy. Based on our findings, water resource protection policies should consider site-specific land-use conditions within each watershed to optimize mitigation strategies for contrasting land-use characteristics and seasonal variations. Copyright © 2016 Elsevier B.V. All rights reserved.
A Survey of UML Based Regression Testing

NASA Astrophysics Data System (ADS)

Fahad, Muhammad; Nadeem, Aamer

Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.
Extrapolation of a predictive model for growth of a low inoculum size of Salmonella typhimurium DT104 on chicken skin to higher inoculum sizes

USDA-ARS?s Scientific Manuscript database

Validation of model predictions for independent variables not included in model development can save time and money by identifying conditions for which new models are not needed. A single strain of Salmonella Typhimurium DT104 was used to develop a general regression neural network model for growth...
On approaches to analyze the sensitivity of simulated hydrologic fluxes to model parameters in the community land model

DOE PAGES

Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...

2015-12-04

Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System

PubMed Central

Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio

2009-01-01

We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980
Explanatory Power of Multi-scale Physical Descriptors in Modeling Benthic Indices Across Nested Ecoregions of the Pacific Northwest

NASA Astrophysics Data System (ADS)

Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.

2005-05-01

Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.

Multiple-Shrinkage Multinomial Probit Models with Applications to Simulating Geographies in Public Use Data.

PubMed

Burgette, Lane F; Reiter, Jerome P

2013-06-01

Multinomial outcomes with many levels can be challenging to model. Information typically accrues slowly with increasing sample size, yet the parameter space expands rapidly with additional covariates. Shrinking all regression parameters towards zero, as often done in models of continuous or binary response variables, is unsatisfactory, since setting parameters equal to zero in multinomial models does not necessarily imply "no effect." We propose an approach to modeling multinomial outcomes with many levels based on a Bayesian multinomial probit (MNP) model and a multiple shrinkage prior distribution for the regression parameters. The prior distribution encourages the MNP regression parameters to shrink toward a number of learned locations, thereby substantially reducing the dimension of the parameter space. Using simulated data, we compare the predictive performance of this model against two other recently-proposed methods for big multinomial models. The results suggest that the fully Bayesian, multiple shrinkage approach can outperform these other methods. We apply the multiple shrinkage MNP to simulating replacement values for areal identifiers, e.g., census tract indicators, in order to protect data confidentiality in public use datasets.
Cognitive and Behavioural Correlates of Non-Adherence to HIV Anti-Retroviral Therapy: Theoretical and Practical Insight for Clinical Psychology and Health Psychology

ERIC Educational Resources Information Center

Begley, Kim; McLaws, Mary-Louise; Ross, Michael W.; Gold, Julian

2008-01-01

This cross-sectional study identified variables associated with protease inhibitor (PI) non-adherence in 179 patients taking anti-retroviral therapy. Univariate analyses identified 11 variables associated with PI non-adherence. Multiple logistic regression modelling identified three predictors of PI non-adherence: low adherence self-efficacy and…
Prediction of siRNA potency using sparse logistic regression.

PubMed

Hu, Wei; Hu, John

2014-06-01

RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Weight management behaviors in a sample of Iranian adolescent girls.

PubMed

Garousi, S; Garrusi, B; Baneshi, Mohammad Reza; Sharifi, Z

2016-09-01

Attempts to obtain the ideal body shape portrayed in advertising can result in behaviors that lead to an unhealthy reduction in weight. This study was designed to identify contributing factors that may be effective in changing the behavior of a sample of Iranian adolescents. Three hundred fifty adolescent girls from high schools in Kerman, Iran participated in a cross-sectional study based on a self-administered questionnaire. Multifactorial logistic regression modeling was used to identify the factors influencing each of the contributing factors for body management methods, and a decision tree model was constructed to identify individuals who were more or less likely to change their body shape. Approximately one-third of the adolescent girls had attempted dieting, and 37 % of them had exercised to lose weight. The logistic regression model showed that pressure from their mother and the media; father's education level; and body mass index (BMI) were important factors in dieting. BMI and perceived pressure from the media were risk factors for attempting exercise. BMI and perceived pressure from relatives, particularly mothers, and the media were important factors in attempts by adolescent girls to lose weight.
Risk adjustment in the American College of Surgeons National Surgical Quality Improvement Program: a comparison of logistic versus hierarchical modeling.

PubMed

Cohen, Mark E; Dimick, Justin B; Bilimoria, Karl Y; Ko, Clifford Y; Richards, Karen; Hall, Bruce Lee

2009-12-01

Although logistic regression has commonly been used to adjust for risk differences in patient and case mix to permit quality comparisons across hospitals, hierarchical modeling has been advocated as the preferred methodology, because it accounts for clustering of patients within hospitals. It is unclear whether hierarchical models would yield important differences in quality assessments compared with logistic models when applied to American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) data. Our objective was to evaluate differences in logistic versus hierarchical modeling for identifying hospitals with outlying outcomes in the ACS-NSQIP. Data from ACS-NSQIP patients who underwent colorectal operations in 2008 at hospitals that reported at least 100 operations were used to generate logistic and hierarchical prediction models for 30-day morbidity and mortality. Differences in risk-adjusted performance (ratio of observed-to-expected events) and outlier detections from the two models were compared. Logistic and hierarchical models identified the same 25 hospitals as morbidity outliers (14 low and 11 high outliers), but the hierarchical model identified 2 additional high outliers. Both models identified the same eight hospitals as mortality outliers (five low and three high outliers). The values of observed-to-expected events ratios and p values from the two models were highly correlated. Results were similar when data were permitted from hospitals providing < 100 patients. When applied to ACS-NSQIP data, logistic and hierarchical models provided nearly identical results with respect to identification of hospitals' observed-to-expected events ratio outliers. As hierarchical models are prone to implementation problems, logistic regression will remain an accurate and efficient method for performing risk adjustment of hospital quality comparisons.
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

NASA Astrophysics Data System (ADS)

Kim, Young Gyun; Lee, Jongsoo

2016-08-01

In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
Predicting No-Shows in Radiology Using Regression Modeling of Data Available in the Electronic Medical Record.

PubMed

Harvey, H Benjamin; Liu, Catherine; Ai, Jing; Jaworsky, Cristina; Guerrier, Claude Emmanuel; Flores, Efren; Pianykh, Oleg

2017-10-01

To test whether data elements available in the electronic medical record (EMR) can be effectively leveraged to predict failure to attend a scheduled radiology examination. Using data from a large academic medical center, we identified all patients with a diagnostic imaging examination scheduled from January 1, 2016, to April 1, 2016, and determined whether the patient successfully attended the examination. Demographic, clinical, and health services utilization variables available in the EMR potentially relevant to examination attendance were recorded for each patient. We used descriptive statistics and logistic regression models to test whether these data elements could predict failure to attend a scheduled radiology examination. The predictive accuracy of the regression models were determined by calculating the area under the receiver operator curve. Among the 54,652 patient appointments with radiology examinations scheduled during the study period, 6.5% were no-shows. No-show rates were highest for the modalities of mammography and CT and lowest for PET and MRI. Logistic regression indicated that 16 of the 27 demographic, clinical, and health services utilization factors were significantly associated with failure to attend a scheduled radiology examination (P ≤ .05). Stepwise logistic regression analysis demonstrated that previous no-shows, days between scheduling and appointments, modality type, and insurance type were most strongly predictive of no-show. A model considering all 16 data elements had good ability to predict radiology no-shows (area under the receiver operator curve = 0.753). The predictive ability was similar or improved when these models were analyzed by modality. Patient and examination information readily available in the EMR can be successfully used to predict radiology no-shows. Moving forward, this information can be proactively leveraged to identify patients who might benefit from additional patient engagement through appointment reminders or other targeted interventions to avoid no-shows. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Air - water temperature relationships in the trout streams of southeastern Minnesota’s carbonate - sandstone landscape

USGS Publications Warehouse

Krider, Lori A.; Magner, Joseph A.; Perry, Jim; Vondracek, Bruce C.; Ferrington, Leonard C.

2013-01-01

Carbonate-sandstone geology in southeastern Minnesota creates a heterogeneous landscape of springs, seeps, and sinkholes that supply groundwater into streams. Air temperatures are effective predictors of water temperature in surface-water dominated streams. However, no published work investigates the relationship between air and water temperatures in groundwater-fed streams (GWFS) across watersheds. We used simple linear regressions to examine weekly air-water temperature relationships for 40 GWFS in southeastern Minnesota. A 40-stream, composite linear regression model has a slope of 0.38, an intercept of 6.63, and R2 of 0.83. The regression models for GWFS have lower slopes and higher intercepts in comparison to surface-water dominated streams. Regression models for streams with high R2 values offer promise for use as predictive tools for future climate conditions. Climate change is expected to alter the thermal regime of groundwater-fed systems, but will do so at a slower rate than surface-water dominated systems. A regression model of intercept vs. slope can be used to identify streams for which water temperatures are more meteorologically than groundwater controlled, and thus more vulnerable to climate change. Such relationships can be used to guide restoration vs. management strategies to protect trout streams.
A regression tree for identifying combinations of fall risk factors associated to recurrent falling: a cross-sectional elderly population-based study.

PubMed

Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O

2014-06-01

Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.
Environmental factors and flow paths related to Escherichia coli concentrations at two beaches on Lake St. Clair, Michigan, 2002–2005

USGS Publications Warehouse

Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.

2008-01-01

Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Detecting isotopic ratio outliers

NASA Astrophysics Data System (ADS)

Bayne, C. K.; Smith, D. H.

An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.
Prediction of dimethyl disulfide levels from biosolids using statistical modeling.

PubMed

Gabriel, Steven A; Vilalai, Sirapong; Arispe, Susanna; Kim, Hyunook; McConnell, Laura L; Torrents, Alba; Peot, Christopher; Ramirez, Mark

2005-01-01

Two statistical models were used to predict the concentration of dimethyl disulfide (DMDS) released from biosolids produced by an advanced wastewater treatment plant (WWTP) located in Washington, DC, USA. The plant concentrates sludge from primary sedimentation basins in gravity thickeners (GT) and sludge from secondary sedimentation basins in dissolved air flotation (DAF) thickeners. The thickened sludge is pumped into blending tanks and then fed into centrifuges for dewatering. The dewatered sludge is then conditioned with lime before trucking out from the plant. DMDS, along with other volatile sulfur and nitrogen-containing chemicals, is known to contribute to biosolids odors. These models identified oxidation/reduction potential (ORP) values of a GT and DAF, the amount of sludge dewatered by centrifuges, and the blend ratio between GT thickened sludge and DAF thickened sludge in blending tanks as control variables. The accuracy of the developed regression models was evaluated by checking the adjusted R2 of the regression as well as the signs of coefficients associated with each variable. In general, both models explained observed DMDS levels in sludge headspace samples. The adjusted R2 value of the regression models 1 and 2 were 0.79 and 0.77, respectively. Coefficients for each regression model also had the correct sign. Using the developed models, plant operators can adjust the controllable variables to proactively decrease this odorant. Therefore, these models are a useful tool in biosolids management at WWTPs.
Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

PubMed Central

Dipnall, Joanna F.

2016-01-01

Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571
Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

PubMed

Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

2016-01-01

Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.
Methods for estimating annual exceedance probability discharges for streams in Arkansas, based on data through water year 2013

USGS Publications Warehouse

Wagner, Daniel M.; Krieger, Joshua D.; Veilleux, Andrea G.

2016-08-04

In 2013, the U.S. Geological Survey initiated a study to update regional skew, annual exceedance probability discharges, and regional regression equations used to estimate annual exceedance probability discharges for ungaged locations on streams in the study area with the use of recent geospatial data, new analytical methods, and available annual peak-discharge data through the 2013 water year. An analysis of regional skew using Bayesian weighted least-squares/Bayesian generalized-least squares regression was performed for Arkansas, Louisiana, and parts of Missouri and Oklahoma. The newly developed constant regional skew of -0.17 was used in the computation of annual exceedance probability discharges for 281 streamgages used in the regional regression analysis. Based on analysis of covariance, four flood regions were identified for use in the generation of regional regression models. Thirty-nine basin characteristics were considered as potential explanatory variables, and ordinary least-squares regression techniques were used to determine the optimum combinations of basin characteristics for each of the four regions. Basin characteristics in candidate models were evaluated based on multicollinearity with other basin characteristics (variance inflation factor < 2.5) and statistical significance at the 95-percent confidence level (p ≤ 0.05). Generalized least-squares regression was used to develop the final regression models for each flood region. Average standard errors of prediction of the generalized least-squares models ranged from 32.76 to 59.53 percent, with the largest range in flood region D. Pseudo coefficients of determination of the generalized least-squares models ranged from 90.29 to 97.28 percent, with the largest range also in flood region D. The regional regression equations apply only to locations on streams in Arkansas where annual peak discharges are not substantially affected by regulation, diversion, channelization, backwater, or urbanization. The applicability and accuracy of the regional regression equations depend on the basin characteristics measured for an ungaged location on a stream being within range of those used to develop the equations.
Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

NASA Astrophysics Data System (ADS)

Pradhan, Biswajeet

2010-05-01

This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Novel risk score of contrast-induced nephropathy after percutaneous coronary intervention.

PubMed

Ji, Ling; Su, XiaoFeng; Qin, Wei; Mi, XuHua; Liu, Fei; Tang, XiaoHong; Li, Zi; Yang, LiChuan

2015-08-01

Contrast-induced nephropathy (CIN) post-percutaneous coronary intervention (PCI) is a major cause of acute kidney injury. In this study, we established a comprehensive risk score model to assess risk of CIN after PCI procedure, which could be easily used in a clinical environment. A total of 805 PCI patients, divided into analysis cohort (70%) and validation cohort (30%), were enrolled retrospectively in this study. Risk factors for CIN were identified using univariate analysis and multivariate logistic regression in the analysis cohort. Risk score model was developed based on multiple regression coefficients. Sensitivity and specificity of the new risk score system was validated in the validation cohort. Comparisons between the new risk score model and previous reported models were applied. The incidence of post-PCI CIN in the analysis cohort (n = 565) was 12%. Considerably high CIN incidence (50%) was observed in patients with chronic kidney disease (CKD). Age >75, body mass index (BMI) >25, myoglobin level, cardiac function level, hypoalbuminaemia, history of chronic kidney disease (CKD), Intra-aortic balloon pump (IABP) and peripheral vascular disease (PVD) were identified as independent risk factors of post-PCI CIN. A novel risk score model was established using multivariate regression coefficients, which showed highest sensitivity and specificity (0.917, 95%CI 0.877-0.957) compared with previous models. A new post-PCI CIN risk score model was developed based on a retrospective study of 805 patients. Application of this model might be helpful to predict CIN in patients undergoing PCI procedure. © 2015 Asian Pacific Society of Nephrology.
Identifying Aspects of Parental Involvement that Affect the Academic Achievement of High School Students

ERIC Educational Resources Information Center

Roulette-McIntyre, Ovella; Bagaka's, Joshua G.; Drake, Daniel D.

2005-01-01

This study identified parental practices that relate positively to high school students' academic performance. Parents of 643 high school students participated in the study. Data analysis, using a multiple linear regression model, shows parent-school connection, student gender, and race are significant predictors of student academic performance.…
Identifying Pedophiles "Eligible" for Community Notification under Megan's Law: A Multivariate Model for Actuarially Anchored Decisions.

ERIC Educational Resources Information Center

Pallone, Nathaniel J.; Hennessy, James J.; Voelbel, Gerald T.

1998-01-01

A scientifically sound methodology for identifying offenders about whose presence the community should be notified is demonstrated. A stepwise multiple regression was calculated among incarcerated pedophiles (N=52) including both psychological and legal data; a precision-weighted equation produced 90.4% "true positives." This methodology can be…
Modeling time-to-event (survival) data using classification tree analysis.

PubMed

Linden, Ariel; Yarnold, Paul R

2017-12-01

Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

Access disparities to Magnet hospitals for patients undergoing neurosurgical operations

PubMed Central

Missios, Symeon; Bekelis, Kimon

2017-01-01

Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

PubMed

Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

2016-01-01

Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Plasma Cholesterol–Induced Lesion Networks Activated before Regression of Early, Mature, and Advanced Atherosclerosis

PubMed Central

Björkegren, Johan L. M.; Hägg, Sara; Jain, Rajeev K.; Cedergren, Cecilia; Shang, Ming-Mei; Rossignoli, Aránzazu; Takolander, Rabbe; Melander, Olle; Hamsten, Anders; Michoel, Tom; Skogsberg, Josefin

2014-01-01

Plasma cholesterol lowering (PCL) slows and sometimes prevents progression of atherosclerosis and may even lead to regression. Little is known about how molecular processes in the atherosclerotic arterial wall respond to PCL and modify responses to atherosclerosis regression. We studied atherosclerosis regression and global gene expression responses to PCL (≥80%) and to atherosclerosis regression itself in early, mature, and advanced lesions. In atherosclerotic aortic wall from Ldlr−/−Apob 100/100 Mttp flox/floxMx1-Cre mice, atherosclerosis regressed after PCL regardless of lesion stage. However, near-complete regression was observed only in mice with early lesions; mice with mature and advanced lesions were left with regression-resistant, relatively unstable plaque remnants. Atherosclerosis genes responding to PCL before regression, unlike those responding to the regression itself, were enriched in inherited risk for coronary artery disease and myocardial infarction, indicating causality. Inference of transcription factor (TF) regulatory networks of these PCL-responsive gene sets revealed largely different networks in early, mature, and advanced lesions. In early lesions, PPARG was identified as a specific master regulator of the PCL-responsive atherosclerosis TF-regulatory network, whereas in mature and advanced lesions, the specific master regulators were MLL5 and SRSF10/XRN2, respectively. In a THP-1 foam cell model of atherosclerosis regression, siRNA targeting of these master regulators activated the time-point-specific TF-regulatory networks and altered the accumulation of cholesterol esters. We conclude that PCL leads to complete atherosclerosis regression only in mice with early lesions. Identified master regulators and related PCL-responsive TF-regulatory networks will be interesting targets to enhance PCL-mediated regression of mature and advanced atherosclerotic lesions. PMID:24586211
Estimating Infiltration Rates for a Loessal Silt Loam Using Soil Properties

Treesearch

M. Dean Knighton

1978-01-01

Soil properties were related to infiltration rates as measured by single-ringsteady-head infiltometers. The properties showing strong simple correlations were identified. Regression models were developed to estimate infiltration rate from several soil properties. The best model gave fair agreement to measured rates at another location.
Applying Recursive Sensitivity Analysis to Multi-Criteria Decision Models to Reduce Bias in Defense Cyber Engineering Analysis

DTIC Science & Technology

2015-10-28

techniques such as regression analysis, correlation, and multicollinearity assessment to identify the change and error on the input to the model...between many of the independent or predictor variables, the issue of multicollinearity may arise [18]. VII. SUMMARY Accurate decisions concerning
Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

ERIC Educational Resources Information Center

Mavridis, Dimitris; Moustaki, Irini

2008-01-01

In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

PubMed

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Pediatric Irritable Bowel Syndrome Patient and Parental Characteristics Differ by Care Management Type.

PubMed

Hollier, John M; Czyzewski, Danita I; Self, Mariella M; Weidler, Erica M; Smith, E O'Brian; Shulman, Robert J

2017-03-01

This study evaluates whether certain patient or parental characteristics are associated with gastroenterology (GI) referral versus primary pediatrics care for pediatric irritable bowel syndrome (IBS). A retrospective clinical trial sample of patients meeting pediatric Rome III IBS criteria was assembled from a single metropolitan health care system. Baseline socioeconomic status (SES) and clinical symptom measures were gathered. Various instruments measured participant and parental psychosocial traits. Study outcomes were stratified by GI referral versus primary pediatrics care. Two separate analyses of SES measures and GI clinical symptoms and psychosocial measures identified key factors by univariate and multiple logistic regression analyses. For each analysis, identified factors were placed in unadjusted and adjusted multivariate logistic regression models to assess their impact in predicting GI referral. Of the 239 participants, 152 were referred to pediatric GI, and 87 were managed in primary pediatrics care. Of the SES and clinical symptom factors, child self-assessment of abdominal pain duration and lower percentage of people living in poverty were the strongest predictors of GI referral. Among the psychosocial measures, parental assessment of their child's functional disability was the sole predictor of GI referral. In multivariate logistic regression models, all selected factors continued to predict GI referral in each model. Socioeconomic environment, clinical symptoms, and functional disability are associated with GI referral. Future interventions designed to ameliorate the effect of these identified factors could reduce unnecessary specialty consultations and health care overutilization for IBS.
Utility of an Abbreviated Dizziness Questionnaire to Differentiate between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study

PubMed Central

Roland, Lauren T.; Kallogjeri, Dorina; Sinks, Belinda C.; Rauch, Steven D.; Shepard, Neil T.; White, Judith A.; Goebel, Joel A.

2015-01-01

Objective Test performance of a focused dizziness questionnaire’s ability to discriminate between peripheral and non-peripheral causes of vertigo. Study Design Prospective multi-center Setting Four academic centers with experienced balance specialists Patients New dizzy patients Interventions A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Main outcomes Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and non-peripheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. Results 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and non-peripheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central and other causes were considered good as measured by c-indices of 0.75, 0.7 and 0.78, respectively. Conclusions This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from non-peripheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed. PMID:26485598
Utility of an Abbreviated Dizziness Questionnaire to Differentiate Between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study.

PubMed

Roland, Lauren T; Kallogjeri, Dorina; Sinks, Belinda C; Rauch, Steven D; Shepard, Neil T; White, Judith A; Goebel, Joel A

2015-12-01

Test performance of a focused dizziness questionnaire's ability to discriminate between peripheral and nonperipheral causes of vertigo. Prospective multicenter. Four academic centers with experienced balance specialists. New dizzy patients. A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and nonperipheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. In total, 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and nonperipheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central, and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central, and other causes was considered good as measured by c-indices of 0.75, 0.7, and 0.78, respectively. This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from nonperipheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed.
Prognostic models for predicting posttraumatic seizures during acute hospitalization, and at 1 and 2 years following traumatic brain injury.

PubMed

Ritter, Anne C; Wagner, Amy K; Szaflarski, Jerzy P; Brooks, Maria M; Zafonte, Ross D; Pugh, Mary Jo V; Fabio, Anthony; Hammond, Flora M; Dreer, Laura E; Bushnik, Tamara; Walker, William C; Brown, Allen W; Johnson-Greene, Doug; Shea, Timothy; Krellman, Jason W; Rosenthal, Joseph A

2016-09-01

Posttraumatic seizures (PTS) are well-recognized acute and chronic complications of traumatic brain injury (TBI). Risk factors have been identified, but considerable variability in who develops PTS remains. Existing PTS prognostic models are not widely adopted for clinical use and do not reflect current trends in injury, diagnosis, or care. We aimed to develop and internally validate preliminary prognostic regression models to predict PTS during acute care hospitalization, and at year 1 and year 2 postinjury. Prognostic models predicting PTS during acute care hospitalization and year 1 and year 2 post-injury were developed using a recent (2011-2014) cohort from the TBI Model Systems National Database. Potential PTS predictors were selected based on previous literature and biologic plausibility. Bivariable logistic regression identified variables with a p-value < 0.20 that were used to fit initial prognostic models. Multivariable logistic regression modeling with backward-stepwise elimination was used to determine reduced prognostic models and to internally validate using 1,000 bootstrap samples. Fit statistics were calculated, correcting for overfitting (optimism). The prognostic models identified sex, craniotomy, contusion load, and pre-injury limitation in learning/remembering/concentrating as significant PTS predictors during acute hospitalization. Significant predictors of PTS at year 1 were subdural hematoma (SDH), contusion load, craniotomy, craniectomy, seizure during acute hospitalization, duration of posttraumatic amnesia, preinjury mental health treatment/psychiatric hospitalization, and preinjury incarceration. Year 2 significant predictors were similar to those of year 1: SDH, intraparenchymal fragment, craniotomy, craniectomy, seizure during acute hospitalization, and preinjury incarceration. Corrected concordance (C) statistics were 0.599, 0.747, and 0.716 for acute hospitalization, year 1, and year 2 models, respectively. The prognostic model for PTS during acute hospitalization did not discriminate well. Year 1 and year 2 models showed fair to good predictive validity for PTS. Cranial surgery, although medically necessary, requires ongoing research regarding potential benefits of increased monitoring for signs of epileptogenesis, PTS prophylaxis, and/or rehabilitation/social support. Future studies should externally validate models and determine clinical utility. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.
Theory Can Help Structure Regression Models for Projecting Stream Conditions Under Alternative Land Use Scenarios

NASA Astrophysics Data System (ADS)

van Sickle, J.; Baker, J.; Herlihy, A.

2005-05-01

We built multiple regression models for Emphemeroptera/ Plecoptera/ Tricoptera (EPT) taxon richness and other indicators of biological condition in streams of the Willamette River Basin, Oregon, USA. The models were used to project the changes in condition that would be expected in all 2-4th order streams of the 30000 sq km basin under alternative scenarios of future land use. In formulating the models, we invoked the theory of limiting factors to express the interactive effects of stream power and watershed land use on EPT richness. The resulting models were parsimonious, and they fit the data in our wedge-shaped scatterplots slightly better than did a naive additive-effects model. Just as theory helped formulate our regression models, the models in turn helped us identify a new research need for the Basin's streams. Our future scenarios project that conversions of agricultural to urban uses may dominate landscape dynamics in the basin over the next 50 years. But our models could not detect any difference between the effects of agricultural and urban development in watersheds on stream biota. This result points to an increased need for understanding how agricultural and urban land uses in the Basin differentially influence stream ecosystems.
Genetic analyses of partial egg production in Japanese quail using multi-trait random regression models.

PubMed

Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E

2017-12-01

1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.
Modelling and Closed-Loop System Identification of a Quadrotor-Based Aerial Manipulator

NASA Astrophysics Data System (ADS)

Dube, Chioniso; Pedro, Jimoh O.

2018-05-01

This paper presents the modelling and system identification of a quadrotor-based aerial manipulator. The aerial manipulator model is first derived analytically using the Newton-Euler formulation for the quadrotor and Recursive Newton-Euler formulation for the manipulator. The aerial manipulator is then simulated with the quadrotor under Proportional Derivative (PD) control, with the manipulator in motion. The simulation data is then used for system identification of the aerial manipulator. Auto Regressive with eXogenous inputs (ARX) models are obtained from the system identification for linear accelerations \\ddot{X} and \\ddot{Y} and yaw angular acceleration \\ddot{\\psi }. For linear acceleration \\ddot{Z}, and pitch and roll angular accelerations \\ddot{θ } and \\ddot{φ }, Auto Regressive Moving Average with eXogenous inputs (ARMAX) models are identified.
An anthropometric approach to characterising neonatal morbidity and body composition, using air displacement plethysmography as a criterion method

PubMed Central

Carberry, Angela E.; Turner, Robin M.; Bek, Emily J.; Raynes-Greenow, Camille H.; McEwan, Alistair L.; Jeffery, Heather E.

2018-01-01

Background With the greatest burden of infant undernutrition and morbidity in low and middle income countries (LMICs), there is a need for suitable approaches to monitor infants in a simple, low-cost and effective manner. Anthropometry continues to play a major role in characterising growth and nutritional status. Methods We developed a range of models to aid in identifying neonates at risk of malnutrition. We first adopted a logistic regression approach to screen for a composite neonatal morbidity, low and high body fat (BF%) infants. We then developed linear regression models for the estimation of neonatal fat mass as an assessment of body composition and nutritional status. Results We fitted logistic regression models combining up to four anthropometric variables to predict composite morbidity and low and high BF% neonates. The greatest area under receiver-operator characteristic curves (AUC with 95% confidence intervals (CI)) for identifying composite morbidity was 0.740 (0.63, 0.85), resulting from the combination of birthweight, length, chest and mid-thigh circumferences. The AUCs (95% CI) for identifying low and high BF% were 0.827 (0.78, 0.88) and 0.834 (0.79, 0.88), respectively. For identifying composite morbidity, BF% as measured via air displacement plethysmography showed strong predictive ability (AUC 0.786 (0.70, 0.88)), while birthweight percentiles had a lower AUC (0.695 (0.57, 0.82)). Birthweight percentiles could also identify low and high BF% neonates with AUCs of 0.792 (0.74, 0.85) and 0.834 (0.79, 0.88). We applied a sex-specific approach to anthropometric estimation of neonatal fat mass, demonstrating the influence of the testing sample size on the final model performance. Conclusions These models display potential for further development and evaluation in LMICs to detect infants in need of further nutritional management, especially where traditional methods of risk management such as birthweight for gestational age percentiles may be variable or non-existent, or unable to detect appropriately grown, low fat newborns. PMID:29601596
Optimal Reflectance, Transmittance, and Absorptance Wavebands and Band Ratios for the Estimation of Leaf Chlorophyll Concentration

NASA Technical Reports Server (NTRS)

Carter, Gregory A.; Spiering, Bruce A.

2000-01-01

The present study utilized regression analysis to identify: wavebands and band ratios within the 400-850 nm range that could be used to estimate total chlorophyll concentration with minimal error; and simple regression models that were most effective in estimating chlorophyll concentrations were measured for two broadleaved species, a broadleaved vine, a needle-leaved conifer, and a representative of the grass family.Overall, reflectance, transmittance, and absorptance corresponded most precisely with chlorophyll concentration at wavelengths near 700 nm, although regressions were strong as well in the 550-625 nm range.
A model of the human in a cognitive prediction task.

NASA Technical Reports Server (NTRS)

Rouse, W. B.

1973-01-01

The human decision maker's behavior when predicting future states of discrete linear dynamic systems driven by zero-mean Gaussian processes is modeled. The task is on a slow enough time scale that physiological constraints are insignificant compared with cognitive limitations. The model is basically a linear regression system identifier with a limited memory and noisy observations. Experimental data are presented and compared to the model.
Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.

PubMed

Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John

2018-03-01

Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon previous methods for shallow lakes because it allows classification and regression to occur simultaneously and inform one another, directly estimates TP thresholds and the uncertainty associated with thresholds and state classifications, and enables meaningful constraints to be built into models. The BLR framework is broadly applicable to other ecosystems known to exhibit alternative stable states in which regression can be used to establish relationships between driving variables and state variables. © 2017 by the Ecological Society of America.
Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach.

PubMed

Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo

2017-07-01

Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.
Analysis of a database to predict the result of allergy testing in vivo in patients with chronic nasal symptoms.

PubMed

Lacagnina, Valerio; Leto-Barone, Maria S; La Piana, Simona; Seidita, Aurelio; Pingitore, Giuseppe; Di Lorenzo, Gabriele

2014-01-01

This article uses the logistic regression model for diagnostic decision making in patients with chronic nasal symptoms. We studied the ability of the logistic regression model, obtained by the evaluation of a database, to detect patients with positive allergy skin-prick test (SPT) and patients with negative SPT. The model developed was validated using the data set obtained from another medical institution. The analysis was performed using a database obtained from a questionnaire administered to the patients with nasal symptoms containing personal data, clinical data, and results of allergy testing (SPT). All variables found to be significantly different between patients with positive and negative SPT (p < 0.05) were selected for the logistic regression models and were analyzed with backward stepwise logistic regression, evaluated with area under the curve of the receiver operating characteristic curve. A second set of patients from another institution was used to prove the model. The accuracy of the model in identifying, over the second set, both patients whose SPT will be positive and negative was high. The model detected 96% of patients with nasal symptoms and positive SPT and classified 94% of those with negative SPT. This study is preliminary to the creation of a software that could help the primary care doctors in a diagnostic decision making process (need of allergy testing) in patients complaining of chronic nasal symptoms.

Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

PubMed

Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

2013-12-11

The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?

PubMed

Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina

2016-01-01

High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
Representational change and strategy use in children's number line estimation during the first years of primary school.

PubMed

White, Sonia L J; Szűcs, Dénes

2012-01-04

The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice.
Representational change and strategy use in children's number line estimation during the first years of primary school

PubMed Central

2012-01-01

Background The objective of this study was to scrutinize number line estimation behaviors displayed by children in mathematics classrooms during the first three years of schooling. We extend existing research by not only mapping potential logarithmic-linear shifts but also provide a new perspective by studying in detail the estimation strategies of individual target digits within a number range familiar to children. Methods Typically developing children (n = 67) from Years 1-3 completed a number-to-position numerical estimation task (0-20 number line). Estimation behaviors were first analyzed via logarithmic and linear regression modeling. Subsequently, using an analysis of variance we compared the estimation accuracy of each digit, thus identifying target digits that were estimated with the assistance of arithmetic strategy. Results Our results further confirm a developmental logarithmic-linear shift when utilizing regression modeling; however, uniquely we have identified that children employ variable strategies when completing numerical estimation, with levels of strategy advancing with development. Conclusion In terms of the existing cognitive research, this strategy factor highlights the limitations of any regression modeling approach, or alternatively, it could underpin the developmental time course of the logarithmic-linear shift. Future studies need to systematically investigate this relationship and also consider the implications for educational practice. PMID:22217191
Family and school environmental predictors of sleep bruxism in children.

PubMed

Rossi, Debora; Manfredini, Daniele

2013-01-01

To identify potential predictors of self-reported sleep bruxism (SB) within children's family and school environments. A total of 65 primary school children (55.4% males, mean age 9.3 ± 1.9 years) were administered a 10-item questionnaire investigating the prevalence of self-reported SB as well as nine family and school-related potential bruxism predictors. Regression analyses were performed to assess the correlation between the potential predictors and SB. A positive answer to the self-reported SB item was endorsed by 18.8% of subjects, with no sex differences. Multiple variable regression analysis identified a final model showing that having divorced parents and not falling asleep easily were the only two weak predictors of self-reported SB. The percentage of explained variance for SB by the final multiple regression model was 13.3% (Nagelkerke's R² = 0.133). While having a high specificity and a good negative predictive value, the model showed unacceptable sensitivity and positive predictive values. The resulting accuracy to predict the presence of self-reported SB was 73.8%. The present investigation suggested that, among family and school-related matters, having divorced parents and not falling asleep easily were two predictors, even if weak, of a child's self-report of SB.
Patient Stratification Using Electronic Health Records from a Chronic Disease Management Program.

PubMed

Chen, Robert; Sun, Jimeng; Dittus, Robert S; Fabbri, Daniel; Kirby, Jacqueline; Laffer, Cheryl L; McNaughton, Candace D; Malin, Bradley

2016-01-04

The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively. This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering. We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles. Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs. The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.
Applying Intelligent Algorithms to Automate the Identification of Error Factors.

PubMed

Jin, Haizhe; Qu, Qingxing; Munechika, Masahiko; Sano, Masataka; Kajihara, Chisato; Duffy, Vincent G; Chen, Han

2018-05-03

Medical errors are the manifestation of the defects occurring in medical processes. Extracting and identifying defects as medical error factors from these processes are an effective approach to prevent medical errors. However, it is a difficult and time-consuming task and requires an analyst with a professional medical background. The issues of identifying a method to extract medical error factors and reduce the extraction difficulty need to be resolved. In this research, a systematic methodology to extract and identify error factors in the medical administration process was proposed. The design of the error report, extraction of the error factors, and identification of the error factors were analyzed. Based on 624 medical error cases across four medical institutes in both Japan and China, 19 error-related items and their levels were extracted. After which, they were closely related to 12 error factors. The relational model between the error-related items and error factors was established based on a genetic algorithm (GA)-back-propagation neural network (BPNN) model. Additionally, compared to GA-BPNN, BPNN, partial least squares regression and support vector regression, GA-BPNN exhibited a higher overall prediction accuracy, being able to promptly identify the error factors from the error-related items. The combination of "error-related items, their different levels, and the GA-BPNN model" was proposed as an error-factor identification technology, which could automatically identify medical error factors.
Identifying Autocorrelation Generated by Various Error Processes in Interrupted Time-Series Regression Designs: A Comparison of AR1 and Portmanteau Tests

ERIC Educational Resources Information Center

Huitema, Bradley E.; McKean, Joseph W.

2007-01-01

Regression models used in the analysis of interrupted time-series designs assume statistically independent errors. Four methods of evaluating this assumption are the Durbin-Watson (D-W), Huitema-McKean (H-M), Box-Pierce (B-P), and Ljung-Box (L-B) tests. These tests were compared with respect to Type I error and power under a wide variety of error…
Predicted effect size of lisdexamfetamine treatment of attention deficit/hyperactivity disorder (ADHD) in European adults: Estimates based on indirect analysis using a systematic review and meta-regression analysis.

PubMed

Fridman, M; Hodgkins, P S; Kahle, J S; Erder, M H

2015-06-01

There are few approved therapies for adults with attention-deficit/hyperactivity disorder (ADHD) in Europe. Lisdexamfetamine (LDX) is an effective treatment for ADHD; however, no clinical trials examining the efficacy of LDX specifically in European adults have been conducted. Therefore, to estimate the efficacy of LDX in European adults we performed a meta-regression of existing clinical data. A systematic review identified US- and Europe-based randomized efficacy trials of LDX, atomoxetine (ATX), or osmotic-release oral system methylphenidate (OROS-MPH) in children/adolescents and adults. A meta-regression model was then fitted to the published/calculated effect sizes (Cohen's d) using medication, geographical location, and age group as predictors. The LDX effect size in European adults was extrapolated from the fitted model. Sensitivity analyses performed included using adult-only studies and adding studies with placebo designs other than a standard pill-placebo design. Twenty-two of 2832 identified articles met inclusion criteria. The model-estimated effect size of LDX for European adults was 1.070 (95% confidence interval: 0.738, 1.401), larger than the 0.8 threshold for large effect sizes. The overall model fit was adequate (80%) and stable in the sensitivity analyses. This model predicts that LDX may have a large treatment effect size in European adults with ADHD. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
Individual risk factors for deep infection and compromised fracture healing after intramedullary nailing of tibial shaft fractures: a single centre experience of 480 patients.

PubMed

Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S

2015-04-01

Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.
Identifying individual changes in performance with composite quality indicators while accounting for regression to the mean.

PubMed

Gajewski, Byron J; Dunton, Nancy

2013-04-01

Almost a decade ago Morton and Torgerson indicated that perceived medical benefits could be due to "regression to the mean." Despite this caution, the regression to the mean "effects on the identification of changes in institutional performance do not seem to have been considered previously in any depth" (Jones and Spiegelhalter). As a response, Jones and Spiegelhalter provide a methodology to adjust for regression to the mean when modeling recent changes in institutional performance for one-variable quality indicators. Therefore, in our view, Jones and Spiegelhalter provide a breakthrough methodology for performance measures. At the same time, in the interests of parsimony, it is useful to aggregate individual quality indicators into a composite score. Our question is, can we develop and demonstrate a methodology that extends the "regression to the mean" literature to allow for composite quality indicators? Using a latent variable modeling approach, we extend the methodology to the composite indicator case. We demonstrate the approach on 4 indicators collected by the National Database of Nursing Quality Indicators. A simulation study further demonstrates its "proof of concept."
Effect of temperature and precipitation on salmonellosis cases in South-East Queensland, Australia: an observational study

PubMed Central

Barnett, Adrian Gerard

2016-01-01

Objective Foodborne illnesses in Australia, including salmonellosis, are estimated to cost over $A1.25 billion annually. The weather has been identified as being influential on salmonellosis incidence, as cases increase during summer, however time series modelling of salmonellosis is challenging because outbreaks cause strong autocorrelation. This study assesses whether switching models is an improved method of estimating weather–salmonellosis associations. Design We analysed weather and salmonellosis in South-East Queensland between 2004 and 2013 using 2 common regression models and a switching model, each with 21-day lags for temperature and precipitation. Results The switching model best fit the data, as judged by its substantial improvement in deviance information criterion over the regression models, less autocorrelated residuals and control of seasonality. The switching model estimated a 5°C increase in mean temperature and 10 mm precipitation were associated with increases in salmonellosis cases of 45.4% (95% CrI 40.4%, 50.5%) and 24.1% (95% CrI 17.0%, 31.6%), respectively. Conclusions Switching models improve on traditional time series models in quantifying weather–salmonellosis associations. A better understanding of how temperature and precipitation influence salmonellosis may identify where interventions can be made to lower the health and economic costs of salmonellosis. PMID:26916693
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bramer, L. M.; Rounds, J.; Burleyson, C. D.

Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone levelmore » was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.

Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days

DOE PAGES

Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.; ...

2017-09-22

Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
Feed-forward and generalized regression neural networks in modeling feeding behavior of pigs in the grow-finishing phase

USDA-ARS?s Scientific Manuscript database

Feeding patterns in group-housed grow-finishing pigs have been investigated for use in management decisions, identifying sick animals, and determining genetic differences within a herd. Development of models to predict swine feeding behaviour has been limited due the large number of potential enviro...
Endoscopic third ventriculostomy in the treatment of childhood hydrocephalus.

PubMed

Kulkarni, Abhaya V; Drake, James M; Mallucci, Conor L; Sgouros, Spyros; Roth, Jonathan; Constantini, Shlomi

2009-08-01

To develop a model to predict the probability of endoscopic third ventriculostomy (ETV) success in the treatment for hydrocephalus on the basis of a child's individual characteristics. We analyzed 618 ETVs performed consecutively on children at 12 international institutions to identify predictors of ETV success at 6 months. A multivariable logistic regression model was developed on 70% of the dataset (training set) and validated on 30% of the dataset (validation set). In the training set, 305/455 ETVs (67.0%) were successful. The regression model (containing patient age, cause of hydrocephalus, and previous cerebrospinal fluid shunt) demonstrated good fit (Hosmer-Lemeshow, P = .78) and discrimination (C statistic = 0.70). In the validation set, 105/163 ETVs (64.4%) were successful and the model maintained good fit (Hosmer-Lemeshow, P = .45), discrimination (C statistic = 0.68), and calibration (calibration slope = 0.88). A simplified ETV Success Score was devised that closely approximates the predicted probability of ETV success. Children most likely to succeed with ETV can now be accurately identified and spared the long-term complications of CSF shunting.
Effort test failure: toward a predictive model.

PubMed

Webb, James W; Batchelor, Jennifer; Meares, Susanne; Taylor, Alan; Marsh, Nigel V

2012-01-01

Predictors of effort test failure were examined in an archival sample of 555 traumatically brain-injured (TBI) adults. Logistic regression models were used to examine whether compensation-seeking, injury-related, psychological, demographic, and cultural factors predicted effort test failure (ETF). ETF was significantly associated with compensation-seeking (OR = 3.51, 95% CI [1.25, 9.79]), low education (OR:. 83 [.74, . 94]), self-reported mood disorder (OR: 5.53 [3.10, 9.85]), exaggerated displays of behavior (OR: 5.84 [2.15, 15.84]), psychotic illness (OR: 12.86 [3.21, 51.44]), being foreign-born (OR: 5.10 [2.35, 11.06]), having sustained a workplace accident (OR: 4.60 [2.40, 8.81]), and mild traumatic brain injury severity compared with very severe traumatic brain injury severity (OR: 0.37 [0.13, 0.995]). ETF was associated with a broader range of statistical predictors than has previously been identified and the relative importance of psychological and behavioral predictors of ETF was evident in the logistic regression model. Variables that might potentially extend the model of ETF are identified for future research efforts.
Urban change analysis and future growth of Istanbul.

PubMed

Akın, Anıl; Sunar, Filiz; Berberoğlu, Süha

2015-08-01

This study is aimed at analyzing urban change within Istanbul and assessing the city's future growth potential using appropriate approach modeling for the year 2040. Urban growth is a major driving force of land-use change, and spatial and temporal components of urbanization can be identified through accurate spatial modeling. In this context, widely used urban modeling approaches, such as the Markov chain and logistic regression based on cellular automata (CA), were used to simulate urban growth within Istanbul. The distance from each pixel to the urban and road classes, elevation, and slope, together with municipality and land use maps (as an excluded layer), were identified as factors. Calibration data were obtained from remotely sensed data recorded in 1972, 1986, and 2013. Validation was performed by overlaying the simulated and actual 2013 urban maps, and a kappa index of agreement was derived. The results indicate that urban expansion will influence mainly forest areas during the time period of 2013-2040. The urban expansion was predicted as 429 and 327 km(2) with the Markov chain and logistic regression models, respectively.
High-risk regions and outbreak modelling of tularemia in humans.

PubMed

Desvars-Larrive, A; Liu, X; Hjertqvist, M; Sjöstedt, A; Johansson, A; Rydén, P

2017-02-01

Sweden reports large and variable numbers of human tularemia cases, but the high-risk regions are anecdotally defined and factors explaining annual variations are poorly understood. Here, high-risk regions were identified by spatial cluster analysis on disease surveillance data for 1984-2012. Negative binomial regression with five previously validated predictors (including predicted mosquito abundance and predictors based on local weather data) was used to model the annual number of tularemia cases within the high-risk regions. Seven high-risk regions were identified with annual incidences of 3·8-44 cases/100 000 inhabitants, accounting for 56·4% of the tularemia cases but only 9·3% of Sweden's population. For all high-risk regions, most cases occurred between July and September. The regression models explained the annual variation of tularemia cases within most high-risk regions and discriminated between years with and without outbreaks. In conclusion, tularemia in Sweden is concentrated in a few high-risk regions and shows high annual and seasonal variations. We present reproducible methods for identifying tularemia high-risk regions and modelling tularemia cases within these regions. The results may help health authorities to target populations at risk and lay the foundation for developing an early warning system for outbreaks.

Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

PubMed

Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

2018-05-10

Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.
Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildfires, Southern California, 2003-2006

USGS Publications Warehouse

Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.

2008-01-01

Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.
Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

PubMed Central

Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

2013-01-01

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

PubMed

Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

2013-01-01

Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk

PubMed Central

Czarnota, Jenna; Gennings, Chris; Wheeler, David C

2015-01-01

In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.

PubMed

Czarnota, Jenna; Gennings, Chris; Wheeler, David C

2015-01-01

In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.
Linear Modeling and Evaluation of Controls on Flow Response in Western Post-Fire Watersheds

NASA Astrophysics Data System (ADS)

Saxe, S.; Hogue, T. S.; Hay, L.

2015-12-01

This research investigates the impact of wildfires on watershed flow regimes throughout the western United States, specifically focusing on evaluation of fire events within specified subregions and determination of the impact of climate and geophysical variables in post-fire flow response. Fire events were collected through federal and state-level databases and streamflow data were collected from U.S. Geological Survey stream gages. 263 watersheds were identified with at least 10 years of continuous pre-fire daily streamflow records and 5 years of continuous post-fire daily flow records. For each watershed, percent changes in runoff ratio (RO), annual seven day low-flows (7Q2) and annual seven day high-flows (7Q10) were calculated from pre- to post-fire. Numerous independent variables were identified for each watershed and fire event, including topographic, land cover, climate, burn severity, and soils data. The national watersheds were divided into five regions through K-clustering and a lasso linear regression model, applying the Leave-One-Out calibration method, was calculated for each region. Nash-Sutcliffe Efficiency (NSE) was used to determine the accuracy of the resulting models. The regions encompassing the United States along and west of the Rocky Mountains, excluding the coastal watersheds, produced the most accurate linear models. The Pacific coast region models produced poor and inconsistent results, indicating that the regions need to be further subdivided. Presently, RO and HF response variables appear to be more easily modeled than LF. Results of linear regression modeling showed varying importance of watershed and fire event variables, with conflicting correlation between land cover types and soil types by region. The addition of further independent variables and constriction of current variables based on correlation indicators is ongoing and should allow for more accurate linear regression modeling.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.

PubMed

Jiang, Dingfeng; Huang, Jian; Zhang, Ying

2013-10-01

We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Peak-flow characteristics of Virginia streams

USGS Publications Warehouse

Austin, Samuel H.; Krstolic, Jennifer L.; Wiegand, Ute

2011-01-01

Peak-flow annual exceedance probabilities, also called probability-percent chance flow estimates, and regional regression equations are provided describing the peak-flow characteristics of Virginia streams. Statistical methods are used to evaluate peak-flow data. Analysis of Virginia peak-flow data collected from 1895 through 2007 is summarized. Methods are provided for estimating unregulated peak flow of gaged and ungaged streams. Station peak-flow characteristics identified by fitting the logarithms of annual peak flows to a Log Pearson Type III frequency distribution yield annual exceedance probabilities of 0.5, 0.4292, 0.2, 0.1, 0.04, 0.02, 0.01, 0.005, and 0.002 for 476 streamgaging stations. Stream basin characteristics computed using spatial data and a geographic information system are used as explanatory variables in regional regression model equations for six physiographic regions to estimate regional annual exceedance probabilities at gaged and ungaged sites. Weighted peak-flow values that combine annual exceedance probabilities computed from gaging station data and from regional regression equations provide improved peak-flow estimates. Text, figures, and lists are provided summarizing selected peak-flow sites, delineated physiographic regions, peak-flow estimates, basin characteristics, regional regression model equations, error estimates, definitions, data sources, and candidate regression model equations. This study supersedes previous studies of peak flows in Virginia.
Performances on the CogState and standard neuropsychological batteries among HIV patients without dementia.

PubMed

Overton, Edgar Turner; Kauwe, John S K; Paul, Robert; Tashima, Karen; Tate, David F; Patel, Pragna; Carpenter, Charles C J; Patty, David; Brooks, John T; Clifford, David B

2011-11-01

HIV-associated neurocognitive disorders remain prevalent but challenging to diagnose particularly among non-demented individuals. To determine whether a brief computerized battery correlates with formal neurocognitive testing, we identified 46 HIV-infected persons who had undergone both formal neurocognitive testing and a brief computerized battery. Simple detection tests correlated best with formal neuropsychological testing. By multivariable regression model, 53% of the variance in the composite Global Deficit Score was accounted for by elements from the brief computerized tool (P < 0.01). These data confirm previous correlation data with the computerized battery. Using the five significant parameters from the regression model in a Receiver Operating Characteristic curve, 90% of persons were accurately classified as being cognitively impaired or not. The test battery requires additional evaluation, specifically for identifying persons with mild impairment, a state upon which interventions may be effective.
Risk factors for highly pathogenic avian influenza in commercial layer chicken farms in bangladesh during 2011.

PubMed

Osmani, M G; Thornton, R N; Dhand, N K; Hoque, M A; Milon, Sk M A; Kalam, M A; Hossain, M; Yamage, M

2014-12-01

A case-control study conducted during 2011 involved 90 randomly selected commercial layer farms infected with highly pathogenic avian influenza type A subtype H5N1 (HPAI) and 175 control farms randomly selected from within 5 km of infected farms. A questionnaire was designed to obtain information about potential risk factors for contracting HPAI and was administered to farm owners or managers. Logistic regression analyses were conducted to identify significant risk factors. A total of 20 of 43 risk factors for contracting HPAI were identified after univariable logistic regression analysis. A multivariable logistic regression model was derived by forward stepwise selection. Both unmatched and matched analyses were performed. The key risk factors identified were numbers of staff, frequency of veterinary visits, presence of village chickens roaming on the farm and staff trading birds. Aggregating these findings with those from other studies resulted in a list of 16 key risk factors identified in Bangladesh. Most of these related to biosecurity. It is considered feasible for Bangladesh to achieve a very low incidence of HPAI. Using the cumulative list of risk factors to enhance biosecurity pertaining to commercial farms would facilitate this objective. © 2013 Blackwell Verlag GmbH.
GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

PubMed

Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

2016-01-01

Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Analyzing industrial energy use through ordinary least squares regression models

NASA Astrophysics Data System (ADS)

Golden, Allyson Katherine

Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection

NASA Astrophysics Data System (ADS)

Schlechtingen, Meik; Ferreira Santos, Ilmar

2011-07-01

This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal behavior model is compared to two artificial neural network based approaches, which are a full signal reconstruction and an autoregressive normal behavior model. Based on a real time series containing two generator bearing damages the capabilities of identifying the incipient fault prior to the actual failure are investigated. The period after the first bearing damage is used to develop the three normal behavior models. The developed or trained models are used to investigate how the second damage manifests in the prediction error. Furthermore the full signal reconstruction and the autoregressive approach are applied to further real time series containing gearbox bearing damages and stator temperature anomalies. The comparison revealed all three models being capable of detecting incipient faults. However, they differ in the effort required for model development and the remaining operational time after first indication of damage. The general nonlinear neural network approaches outperform the regression model. The remaining seasonality in the regression model prediction error makes it difficult to detect abnormality and leads to increased alarm levels and thus a shorter remaining operational period. For the bearing damages and the stator anomalies under investigation the full signal reconstruction neural network gave the best fault visibility and thus led to the highest confidence level.
Spatiotemporal analysis of the relationship between socioeconomic factors and stroke in the Portuguese mainland population under 65 years old.

PubMed

Oliveira, André; Cabral, António J R; Mendes, Jorge M; Martins, Maria R O; Cabral, Pedro

2015-11-04

Stroke risk has been shown to display varying patterns of geographic distribution amongst countries but also between regions of the same country. Traditionally a disease of older persons, a global 25% increase in incidence instead was noticed between 1990 and 2010 in persons aged 20-≤64 years, particularly in low- and medium-income countries. Understanding spatial disparities in the association between socioeconomic factors and stroke is critical to target public health initiatives aiming to mitigate or prevent this disease, including in younger persons. We aimed to identify socioeconomic determinants of geographic disparities of stroke risk in people <65 years old, in municipalities of mainland Portugal, and the spatiotemporal variation of the association between these determinants and stroke risk during two study periods (1992-1996 and 2002-2006). Poisson and negative binomial global regression models were used to explore determinants of disease risk. Geographically weighted regression (GWR) represents a distinctive approach, allowing estimation of local regression coefficients. Models for both study periods were identified. Significant variables included education attainment, work hours per week and unemployment. Local Poisson GWR models achieved the best fit and evidenced spatially varying regression coefficients. Spatiotemporal inequalities were observed in significant variables, with dissimilarities between men and women. This study contributes to a better understanding of the relationship between stroke and socioeconomic factors in the population <65 years of age, one age group seldom analysed separately. It can thus help to improve the targeting of public health initiatives, even more in a context of economic crisis.
Use and interpretation of logistic regression in habitat-selection studies

USGS Publications Warehouse

Keating, Kim A.; Cherry, Steve

2004-01-01

Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model

NASA Technical Reports Server (NTRS)

Boone, Spencer

2017-01-01

This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.
[Gender difference in risk factors for depression in community-dwelling elders].

PubMed

Kim, Chul-Gyu; Park, Seungmi

2012-02-01

This study was conducted to compare the degree of depression between men and women and to identify factors influencing their depression. Participants in this cross-sectional descriptive study were 263 persons over 65 years old (men: 103, women: 160). Data were collected through face to face interviews using questionnaires and were done in two urban areas in 2010. Research instruments utilized in this study were SGDS, MMSE-K, SRH, FILE, sleep pattern scale, family and friend support scale, and social support scale. Multivariate regression analysis was performed to identify factors influencing depression in elders. The proportions of participants with depression were significantly different between men and women (52.4% vs. 67.5%). Regression model for depression in elderly men significantly accounted for 54%; disease stress (32%), economic stress (10%), perceived health status (4%), and family support, educational level, age, and hypertension. Regression model for depression in elderly women significantly accounted for 47%; disease stress (25%), perceived social loneliness (8%), friend support (5%), family stress (4%), and sleep satisfaction, and family support. Results demonstrate that depression is an important health problem for elders, and show gender differences for factors influencing depression. These results could be used in the developing depression prevention programs.
Prediction of biomechanical parameters of the proximal femur using statistical appearance models and support vector regression.

PubMed

Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer

2008-01-01

Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.
Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a case-case study approach

PubMed Central

2012-01-01

Background Identifying risk factors for Salmonella Enteritidis (SE) infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT) in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a) have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a) as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68) and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94), after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors. PMID:23057531

Comparison of regression models for estimation of isometric wrist joint torques using surface electromyography

PubMed Central

2011-01-01

Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG) signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2) values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS) was shown to have high isometric torque estimation accuracy combined with very short training times. PMID:21943179
Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

PubMed

Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

2012-07-27

A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.
Modeling nitrate at domestic and public-supply well depths in the Central Valley, California

USGS Publications Warehouse

Nolan, Bernard T.; Gronberg, JoAnn M.; Faunt, Claudia C.; Eberts, Sandra M.; Belitz, Ken

2014-01-01

Aquifer vulnerability models were developed to map groundwater nitrate concentration at domestic and public-supply well depths in the Central Valley, California. We compared three modeling methods for ability to predict nitrate concentration >4 mg/L: logistic regression (LR), random forest classification (RFC), and random forest regression (RFR). All three models indicated processes of nitrogen fertilizer input at the land surface, transmission through coarse-textured, well-drained soils, and transport in the aquifer to the well screen. The total percent correct predictions were similar among the three models (69–82%), but RFR had greater sensitivity (84% for shallow wells and 51% for deep wells). The results suggest that RFR can better identify areas with high nitrate concentration but that LR and RFC may better describe bulk conditions in the aquifer. A unique aspect of the modeling approach was inclusion of outputs from previous, physically based hydrologic and textural models as predictor variables, which were important to the models. Vertical water fluxes in the aquifer and percent coarse material above the well screen were ranked moderately high-to-high in the RFR models, and the average vertical water flux during the irrigation season was highly significant (p < 0.0001) in logistic regression.
Application of logistic regression to case-control association studies involving two causative loci.

PubMed

North, Bernard V; Curtis, David; Sham, Pak C

2005-01-01

Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.
Mutant mouse models and their contribution to our knowledge of corpus luteum development, function and regression.

PubMed

Henkes, Luiz E; Davis, John S; Rueda, Bo R

2003-11-10

The corpus luteum is a unique organ, which is transitory in nature. The development, maintenance and regression of the corpus luteum are regulated by endocrine, paracrine and autocrine signaling events. Defining the specific mediators of luteal development, maintenance and regression has been difficult and often perplexing due to the complexity that stems from the variety of cell types that make up the luteal tissue. Moreover, some regulators may serve dual functions as a luteotropic and luteolytic agent depending on the temporal and spatial environment in which they are expressed. As a result, some confusion is present in the interpretation of in vitro and in vivo studies. More recently investigators have utilized mutant mouse models to define the functional significance of specific gene products. The goal of this mini-review is to identify and discuss mutant mouse models that have luteal anomalies, which may provide some clues as to the significance of specific regulators of corpus luteum function.
Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

NASA Astrophysics Data System (ADS)

Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

2013-02-01

Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
Non-ignorable missingness in logistic regression.

PubMed

Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

2017-08-30

Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

NASA Astrophysics Data System (ADS)

Lusiana, Evellin Dewi

2017-12-01

The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

PubMed

Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun

2016-05-01

DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). © 2016 WILEY PERIODICALS, INC.
Factors Affecting Retention Behavior: A Model To Predict At-Risk Students. AIR 1997 Annual Forum Paper.

ERIC Educational Resources Information Center

Sadler, William E.; Cohen, Frederic L.; Kockesen, Levent

This paper describes a methodology used in an on-going retention study at New York University (NYU) to identify a series of easily measured factors affecting student departure decisions. Three logistic regression models for predicting student retention were developed, each containing data available at three distinct times during the first…
Faculty Salary Equity: Issues in Regression Model Selection. AIR 1992 Annual Forum Paper.

ERIC Educational Resources Information Center

Moore, Nelle

This paper discusses the determination of college faculty salary inequity and identifies the areas in which human judgment must be used in order to conduct a statistical analysis of salary equity. In addition, it provides some informed guidelines for making those judgments. The paper provides a framework for selecting salary equity models, based…
Using Performance Data Gathered at Several Stages of Achievement in Predicting Subsequent Performance.

ERIC Educational Resources Information Center

Owen, Steven V.; Feldhusen, John F.

This study compares the effectiveness of three models of multivariate prediction for academic success in identifying the criterion variance of achievement in nursing education. The first model involves the use of an optimum set of predictors and one equation derived from a regression analysis on first semester grade average in predicting the…
Elbow joint angle and elbow movement velocity estimation using NARX-multiple layer perceptron neural network model with surface EMG time domain parameters.

PubMed

Raj, Retheep; Sivanandan, K S

2017-01-01

Estimation of elbow dynamics has been the object of numerous investigations. In this work a solution is proposed for estimating elbow movement velocity and elbow joint angle from Surface Electromyography (SEMG) signals. Here the Surface Electromyography signals are acquired from the biceps brachii muscle of human hand. Two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), are extracted from the Surface Electromyography signal. The relationship between the time domain parameters, IEMG and ZC with elbow angular displacement and elbow angular velocity during extension and flexion of the elbow are studied. A multiple input-multiple output model is derived for identifying the kinematics of elbow. A Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural network (MLPNN) model is proposed for the estimation of elbow joint angle and elbow angular velocity. The proposed NARX MLPNN model is trained using Levenberg-marquardt based algorithm. The proposed model is estimating the elbow joint angle and elbow movement angular velocity with appreciable accuracy. The model is validated using regression coefficient value (R). The average regression coefficient value (R) obtained for elbow angular displacement prediction is 0.9641 and for the elbow anglular velocity prediction is 0.9347. The Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural networks (MLPNN) model can be used for the estimation of angular displacement and movement angular velocity of the elbow with good accuracy.
Multivariate logistic regression analysis of postoperative complications and risk model establishment of gastrectomy for gastric cancer: A single-center cohort report.

PubMed

Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing

2016-01-01

Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Practical application of cure mixture model for long-term censored survivor data from a withdrawal clinical trial of patients with major depressive disorder.

PubMed

Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko

2010-04-23

Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.
Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

PubMed

Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

2008-01-01

This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression

PubMed Central

Peng, Limin; Xu, Jinfeng; Kutner, Nancy

2013-01-01

Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
Model-Based Clustering of Regression Time Series Data via APECM -- An AECM Algorithm Sung to an Even Faster Beat

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Wei-Chen; Maitra, Ranjan

2011-01-01

We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
Predictive landslide susceptibility mapping using spatial information in the Pechabun area of Thailand

NASA Astrophysics Data System (ADS)

Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung

2009-04-01

For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
Mixed models, linear dependency, and identification in age-period-cohort models.

PubMed

O'Brien, Robert M

2017-07-20

This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

Construction and analysis of a modular model of caspase activation in apoptosis

PubMed Central

Harrington, Heather A; Ho, Kenneth L; Ghosh, Samik; Tung, KC

2008-01-01

Background A key physiological mechanism employed by multicellular organisms is apoptosis, or programmed cell death. Apoptosis is triggered by the activation of caspases in response to both extracellular (extrinsic) and intracellular (intrinsic) signals. The extrinsic and intrinsic pathways are characterized by the formation of the death-inducing signaling complex (DISC) and the apoptosome, respectively; both the DISC and the apoptosome are oligomers with complex formation dynamics. Additionally, the extrinsic and intrinsic pathways are coupled through the mitochondrial apoptosis-induced channel via the Bcl-2 family of proteins. Results A model of caspase activation is constructed and analyzed. The apoptosis signaling network is simplified through modularization methodologies and equilibrium abstractions for three functional modules. The mathematical model is composed of a system of ordinary differential equations which is numerically solved. Multiple linear regression analysis investigates the role of each module and reduced models are constructed to identify key contributions of the extrinsic and intrinsic pathways in triggering apoptosis for different cell lines. Conclusion Through linear regression techniques, we identified the feedbacks, dissociation of complexes, and negative regulators as the key components in apoptosis. The analysis and reduced models for our model formulation reveal that the chosen cell lines predominately exhibit strong extrinsic caspase, typical of type I cell, behavior. Furthermore, under the simplified model framework, the selected cells lines exhibit different modes by which caspase activation may occur. Finally the proposed modularized model of apoptosis may generalize behavior for additional cells and tissues, specifically identifying and predicting components responsible for the transition from type I to type II cell behavior. PMID:19077196
Epidemiology of Mild Traumatic Brain Injury with Intracranial Hemorrhage: Focusing Predictive Models for Neurosurgical Intervention.

PubMed

Orlando, Alessandro; Levy, A Stewart; Carrick, Matthew M; Tanner, Allen; Mains, Charles W; Bar-Or, David

2017-11-01

To outline differences in neurosurgical intervention (NI) rates between intracranial hemorrhage (ICH) types in mild traumatic brain injuries and help identify which ICH types are most likely to benefit from creation of predictive models for NI. A multicenter retrospective study of adult patients spanning 3 years at 4 U.S. trauma centers was performed. Patients were included if they presented with mild traumatic brain injury (Glasgow Coma Scale score 13-15) with head CT scan positive for ICH. Patients were excluded for skull fractures, "unspecified hemorrhage," or coagulopathy. Primary outcome was NI. Stepwise multivariable logistic regression models were built to analyze the independent association between ICH variables and outcome measures. The study comprised 1876 patients. NI rate was 6.7%. There was a significant difference in rate of NI by ICH type. Subdural hematomas had the highest rate of NI (15.5%) and accounted for 78% of all NIs. Isolated subarachnoid hemorrhages had the lowest, nonzero, NI rate (0.19%). Logistic regression models identified ICH type as the most influential independent variable when examining NI. A model predicting NI for isolated subarachnoid hemorrhages would require 26,928 patients, but a model predicting NI for isolated subdural hematomas would require only 328 patients. This study highlighted disparate NI rates among ICH types in patients with mild traumatic brain injury and identified mild, isolated subdural hematomas as most appropriate for construction of predictive NI models. Increased health care efficiency will be driven by accurate understanding of risk, which can come only from accurate predictive models. Copyright © 2017 Elsevier Inc. All rights reserved.
The cost of unintended pregnancies for employer-sponsored health insurance plans.

PubMed

Dieguez, Gabriela; Pyenson, Bruce S; Law, Amy W; Lynen, Richard; Trussell, James

2015-04-01

Pregnancy is associated with a significant cost for employers providing health insurance benefits to their employees. The latest study on the topic was published in 2002, estimating the unintended pregnancy rate for women covered by employer-sponsored insurance benefits to be approximately 29%. The primary objective of this study was to update the cost of unintended pregnancy to employer-sponsored health insurance plans with current data. The secondary objective was to develop a regression model to identify the factors and associated magnitude that contribute to unintended pregnancies in the employee benefits population. We developed stepwise multinomial logistic regression models using data from a national survey on maternal attitudes about pregnancy before and shortly after giving birth. The survey was conducted by the Centers for Disease Control and Prevention through mail and via telephone interviews between 2009 and 2011 of women who had had a live birth. The regression models were then applied to a large commercial health claims database from the Truven Health MarketScan to retrospectively assign the probability of pregnancy intention to each delivery. Based on the MarketScan database, we estimate that among employer-sponsored health insurance plans, 28.8% of pregnancies are unintended, which is consistent with national findings of 29% in a survey by the Centers for Disease Control and Prevention. These unintended pregnancies account for 27.4% of the annual delivery costs to employers in the United States, or approximately 1% of the typical employer's health benefits spending for 1 year. Using these findings, we present a regression model that employers could apply to their claims data to identify the risk for unintended pregnancies in their health insurance population. The availability of coverage for contraception without employee cost-sharing, as was required by the Affordable Care Act in 2012, combined with the ability to identify women who are at high risk for an unintended pregnancy, can help employers address the costs of unintended pregnancies in their employee benefits population. This can also help to bring contraception efforts into the mainstream of other preventive and wellness programs, such as smoking cessation, obesity management, and diabetes control programs.
The Cost of Unintended Pregnancies for Employer-Sponsored Health Insurance Plans

PubMed Central

Dieguez, Gabriela; Pyenson, Bruce S.; Law, Amy W.; Lynen, Richard; Trussell, James

2015-01-01

Background Pregnancy is associated with a significant cost for employers providing health insurance benefits to their employees. The latest study on the topic was published in 2002, estimating the unintended pregnancy rate for women covered by employer-sponsored insurance benefits to be approximately 29%. Objectives The primary objective of this study was to update the cost of unintended pregnancy to employer-sponsored health insurance plans with current data. The secondary objective was to develop a regression model to identify the factors and associated magnitude that contribute to unintended pregnancies in the employee benefits population. Methods We developed stepwise multinomial logistic regression models using data from a national survey on maternal attitudes about pregnancy before and shortly after giving birth. The survey was conducted by the Centers for Disease Control and Prevention through mail and via telephone interviews between 2009 and 2011 of women who had had a live birth. The regression models were then applied to a large commercial health claims database from the Truven Health MarketScan to retrospectively assign the probability of pregnancy intention to each delivery. Results Based on the MarketScan database, we estimate that among employer-sponsored health insurance plans, 28.8% of pregnancies are unintended, which is consistent with national findings of 29% in a survey by the Centers for Disease Control and Prevention. These unintended pregnancies account for 27.4% of the annual delivery costs to employers in the United States, or approximately 1% of the typical employer's health benefits spending for 1 year. Using these findings, we present a regression model that employers could apply to their claims data to identify the risk for unintended pregnancies in their health insurance population. Conclusion The availability of coverage for contraception without employee cost-sharing, as was required by the Affordable Care Act in 2012, combined with the ability to identify women who are at high risk for an unintended pregnancy, can help employers address the costs of unintended pregnancies in their employee benefits population. This can also help to bring contraception efforts into the mainstream of other preventive and wellness programs, such as smoking cessation, obesity management, and diabetes control programs. PMID:26005515
Investigating bias in squared regression structure coefficients

PubMed Central

Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

2015-01-01

The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273
A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

PubMed

Casero-Alonso, V; López-Fidalgo, J; Torsney, B

2017-01-01

Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright Â© 2016 Elsevier Ireland Ltd. All rights reserved.
Computerized pigment design based on property hypersurfaces

NASA Astrophysics Data System (ADS)

Halova, Jaroslava; Sulcova, Petra; Kupka, Karel

2007-05-01

Competition is tough in the pigment market. Rational pigment design has therefore a competitive advantage, saving time and money. The aim of this work is to provide methods that can assist in designing pigments with defined properties. These methods include partial least squares regression (PLSR), neural network (NN) and generalized regression ANOVA model. Authors show how PLS bi-plot can be used to identify market gaps poorly covered by pigment manufacturers, thus giving an opportunity to develop pigments with potentially profitable properties.
Can shoulder dystocia be reliably predicted?

PubMed

Dodd, Jodie M; Catcheside, Britt; Scheil, Wendy

2012-06-01

To evaluate factors reported to increase the risk of shoulder dystocia, and to evaluate their predictive value at a population level. The South Australian Pregnancy Outcome Unit's population database from 2005 to 2010 was accessed to determine the occurrence of shoulder dystocia in addition to reported risk factors, including age, parity, self-reported ethnicity, presence of diabetes and infant birth weight. Odds ratios (and 95% confidence interval) of shoulder dystocia was calculated for each risk factor, which were then incorporated into a logistic regression model. Test characteristics for each variable in predicting shoulder dystocia were calculated. As a proportion of all births, the reported rate of shoulder dystocia increased significantly from 0.95% in 2005 to 1.38% in 2010 (P = 0.0002). Using a logistic regression model, induction of labour and infant birth weight greater than both 4000 and 4500 g were identified as significant independent predictors of shoulder dystocia. The value of risk factors alone and when incorporated into the logistic regression model was poorly predictive of the occurrence of shoulder dystocia. While there are a number of factors associated with an increased risk of shoulder dystocia, none are of sufficient sensitivity or positive predictive value to allow their use clinically to reliably and accurately identify the occurrence of shoulder dystocia. © 2012 The Authors ANZJOG © 2012 The Royal Australian and New Zealand College of Obstetricians and Gynaecologists.
Dynamic spatiotemporal analysis of indigenous dengue fever at street-level in Guangzhou city, China

PubMed Central

Xia, Yao; Zhang, Yingtao; Huang, Xiaodong; Huang, Jiawei; Nie, Enqiong; Jing, Qinlong; Wang, Guoling; Yang, Zhicong; Hu, Wenbiao

2018-01-01

Background This study aimed to investigate the spatiotemporal clustering and socio-environmental factors associated with dengue fever (DF) incidence rates at street level in Guangzhou city, China. Methods Spatiotemporal scan technique was applied to identify the high risk region of DF. Multiple regression model was used to identify the socio-environmental factors associated with DF infection. A Poisson regression model was employed to examine the spatiotemporal patterns in the spread of DF. Results Spatial clusters of DF were primarily concentrated at the southwest part of Guangzhou city. Age group (65+ years) (Odd Ratio (OR) = 1.49, 95% Confidence Interval (CI) = 1.13 to 2.03), floating population (OR = 1.09, 95% CI = 1.05 to 1.15), low-education (OR = 1.08, 95% CI = 1.01 to 1.16) and non-agriculture (OR = 1.07, 95% CI = 1.03 to 1.11) were associated with DF transmission. Poisson regression results indicated that changes in DF incidence rates were significantly associated with longitude (β = -5.08, P<0.01) and latitude (β = -1.99, P<0.01). Conclusions The study demonstrated that social-environmental factors may play an important role in DF transmission in Guangzhou. As geographic range of notified DF has significantly expanded over recent years, an early warning systems based on spatiotemporal model with socio-environmental is urgently needed to improve the effectiveness and efficiency of dengue control and prevention. PMID:29561835
Dynamic spatiotemporal analysis of indigenous dengue fever at street-level in Guangzhou city, China.

PubMed

Liu, Kangkang; Zhu, Yanshan; Xia, Yao; Zhang, Yingtao; Huang, Xiaodong; Huang, Jiawei; Nie, Enqiong; Jing, Qinlong; Wang, Guoling; Yang, Zhicong; Hu, Wenbiao; Lu, Jiahai

2018-03-01

This study aimed to investigate the spatiotemporal clustering and socio-environmental factors associated with dengue fever (DF) incidence rates at street level in Guangzhou city, China. Spatiotemporal scan technique was applied to identify the high risk region of DF. Multiple regression model was used to identify the socio-environmental factors associated with DF infection. A Poisson regression model was employed to examine the spatiotemporal patterns in the spread of DF. Spatial clusters of DF were primarily concentrated at the southwest part of Guangzhou city. Age group (65+ years) (Odd Ratio (OR) = 1.49, 95% Confidence Interval (CI) = 1.13 to 2.03), floating population (OR = 1.09, 95% CI = 1.05 to 1.15), low-education (OR = 1.08, 95% CI = 1.01 to 1.16) and non-agriculture (OR = 1.07, 95% CI = 1.03 to 1.11) were associated with DF transmission. Poisson regression results indicated that changes in DF incidence rates were significantly associated with longitude (β = -5.08, P<0.01) and latitude (β = -1.99, P<0.01). The study demonstrated that social-environmental factors may play an important role in DF transmission in Guangzhou. As geographic range of notified DF has significantly expanded over recent years, an early warning systems based on spatiotemporal model with socio-environmental is urgently needed to improve the effectiveness and efficiency of dengue control and prevention.
Improving power and robustness for detecting genetic association with extreme-value sampling design.

PubMed

Chen, Hua Yun; Li, Mingyao

2011-12-01

Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.
A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery - part II: an illustrative example.

PubMed

Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo

2007-11-22

Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.
ITS impacts on safety and traffic management : an investigation of secondary crash causes

DOT National Transportation Integrated Search

1999-01-01

In this paper, the authors focus on identifying potential savings from lowering the likelihood of secondary crash occurrences in incidents. Logistic regression models are developed to examine which primary crash characteristics are likely to influenc...
A regression approach to the mapping of bio-physical characteristics of surface sediment using in situ and airborne hyperspectral acquisitions

NASA Astrophysics Data System (ADS)

Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak

2017-02-01

Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.
Artificial neural networks predict the incidence of portosplenomesenteric venous thrombosis in patients with acute pancreatitis.

PubMed

Fei, Y; Hu, J; Li, W-Q; Wang, W; Zong, G-Q

2017-03-01

Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10 -20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability. © 2016 International Society on Thrombosis and Haemostasis.
Evaluating the perennial stream using logistic regression in central Taiwan

NASA Astrophysics Data System (ADS)

Ruljigaljig, T.; Cheng, Y. S.; Lin, H. I.; Lee, C. H.; Yu, T. T.

2014-12-01

This study produces a perennial stream head potential map, based on a logistic regression method with a Geographic Information System (GIS). Perennial stream initiation locations, indicates the location of the groundwater and surface contact, were identified in the study area from field survey. The perennial stream potential map in central Taiwan was constructed using the relationship between perennial stream and their causative factors, such as Catchment area, slope gradient, aspect, elevation, groundwater recharge and precipitation. Here, the field surveys of 272 streams were determined in the study area. The areas under the curve for logistic regression methods were calculated as 0.87. The results illustrate the importance of catchment area and groundwater recharge as key factors within the model. The results obtained from the model within the GIS were then used to produce a map of perennial stream and estimate the location of perennial stream head.
Consistent model identification of varying coefficient quantile regression with BIC tuning parameter selection

PubMed Central

Zheng, Qi; Peng, Limin

2016-01-01

Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.

PubMed

Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying

2018-01-01

Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
Forecasting urban water demand: A meta-regression analysis.

PubMed

Sebri, Maamar

2016-12-01

Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.
Graphical Tools for Linear Structural Equation Modeling

DTIC Science & Technology

2014-06-01

others. 4Kenny and Milan (2011) write, “Identification is perhaps the most difficult concept for SEM researchers to understand. We have seen SEM...model to using typical SEM software to determine model identifia- bility. Kenny and Milan (2011) list the following drawbacks: (i) If poor starting...the well known recursive and null rules (Bollen, 1989) and the regression rule (Kenny and Milan , 2011). A Simple Criterion for Identifying Individual

Predictors of the number of under-five malnourished children in Bangladesh: application of the generalized poisson regression model

PubMed Central

2013-01-01

Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
FPGA implementation of predictive degradation model for engine oil lifetime

NASA Astrophysics Data System (ADS)

Idros, M. F. M.; Razak, A. H. A.; Junid, S. A. M. Al; Suliman, S. I.; Halim, A. K.

2018-03-01

This paper presents the implementation of linear regression model for degradation prediction on Register Transfer Logic (RTL) using QuartusII. A stationary model had been identified in the degradation trend for the engine oil in a vehicle in time series method. As for RTL implementation, the degradation model is written in Verilog HDL and the data input are taken at a certain time. Clock divider had been designed to support the timing sequence of input data. At every five data, a regression analysis is adapted for slope variation determination and prediction calculation. Here, only the negative value are taken as the consideration for the prediction purposes for less number of logic gate. Least Square Method is adapted to get the best linear model based on the mean values of time series data. The coded algorithm has been implemented on FPGA for validation purposes. The result shows the prediction time to change the engine oil.
Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites.

PubMed

Betel, Doron; Koppal, Anjali; Agius, Phaedra; Sander, Chris; Leslie, Christina

2010-01-01

mirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites.
A support vector regression-firefly algorithm-based model for limiting velocity prediction in sewer pipes.

PubMed

Ebtehaj, Isa; Bonakdari, Hossein

2016-01-01

Sediment transport without deposition is an essential consideration in the optimum design of sewer pipes. In this study, a novel method based on a combination of support vector regression (SVR) and the firefly algorithm (FFA) is proposed to predict the minimum velocity required to avoid sediment settling in pipe channels, which is expressed as the densimetric Froude number (Fr). The efficiency of support vector machine (SVM) models depends on the suitable selection of SVM parameters. In this particular study, FFA is used by determining these SVM parameters. The actual effective parameters on Fr calculation are generally identified by employing dimensional analysis. The different dimensionless variables along with the models are introduced. The best performance is attributed to the model that employs the sediment volumetric concentration (C(V)), ratio of relative median diameter of particles to hydraulic radius (d/R), dimensionless particle number (D(gr)) and overall sediment friction factor (λ(s)) parameters to estimate Fr. The performance of the SVR-FFA model is compared with genetic programming, artificial neural network and existing regression-based equations. The results indicate the superior performance of SVR-FFA (mean absolute percentage error = 2.123%; root mean square error =0.116) compared with other methods.
Dynamic regression modeling of daily nitrate-nitrogen concentrations in a large agricultural watershed.

PubMed

Feng, Zhujing; Schilling, Keith E; Chan, Kung-Sik

2013-06-01

Nitrate-nitrogen concentrations in rivers represent challenges for water supplies that use surface water sources. Nitrate concentrations are often modeled using time-series approaches, but previous efforts have typically relied on monthly time steps. In this study, we developed a dynamic regression model of daily nitrate concentrations in the Raccoon River, Iowa, that incorporated contemporaneous and lags of precipitation and discharge occurring at several locations around the basin. Results suggested that 95 % of the variation in daily nitrate concentrations measured at the outlet of a large agricultural watershed can be explained by time-series patterns of precipitation and discharge occurring in the basin. Discharge was found to be a more important regression variable than precipitation in our model but both regression parameters were strongly correlated with nitrate concentrations. The time-series model was consistent with known patterns of nitrate behavior in the watershed, successfully identifying contemporaneous dilution mechanisms from higher relief and urban areas of the basin while incorporating the delayed contribution of nitrate from tile-drained regions in a lagged response. The first difference of the model errors were modeled as an AR(16) process and suggest that daily nitrate concentration changes remain temporally correlated for more than 2 weeks although temporal correlation was stronger in the first few days before tapering off. Consequently, daily nitrate concentrations are non-stationary, i.e. of strong memory. Using time-series models to reliably forecast daily nitrate concentrations in a river based on patterns of precipitation and discharge occurring in its basin may be of great interest to water suppliers.
Novel Analog For Muscle Deconditioning

NASA Technical Reports Server (NTRS)

Ploutz-Snyder, Lori; Ryder, Jeff; Buxton, Roxanne; Redd, Elizabeth; Scott-Pandorf, Melissa; Hackney, Kyle; Fiedler, James; Bloomberg, Jacob

2010-01-01

Existing models of muscle deconditioning are cumbersome and expensive (ex: bedrest). We propose a new model utilizing a weighted suit to manipulate strength, power or endurance (function) relative to body weight (BW). Methods: 20 subjects performed 7 occupational astronaut tasks while wearing a suit weighted with 0-120% of BW. Models of the full relationship between muscle function/BW and task completion time were developed using fractional polynomial regression and verified by the addition of pre- and post-flight astronaut performance data using the same tasks. Spline regression was used to identify muscle function thresholds below which task performance was impaired. Results: Thresholds of performance decline were identified for each task. Seated egress & walk (most difficult task) showed thresholds of: leg press (LP) isometric peak force/BW of 18 N/kg, LP power/BW of 18 W/kg, LP work/ BW of 79 J/kg, knee extension (KE) isokinetic/BW of 6 Nm/Kg and KE torque/BW of 1.9 Nm/kg. Conclusions: Laboratory manipulation of strength / BW has promise as an appropriate analog for spaceflight-induced loss of muscle function for predicting occupational task performance and establishing operationally relevant exercise targets.
The importance of regional models in assessing canine cancer incidences in Switzerland

PubMed Central

Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

2018-01-01

Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships. PMID:29652921
The importance of regional models in assessing canine cancer incidences in Switzerland.

PubMed

Boo, Gianluca; Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

2018-01-01

Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships.
The Mean Is Not Enough: Using Quantile Regression to Examine Trends in Asian-White Differences across the Entire Achievement Distribution

ERIC Educational Resources Information Center

Konstantopoulos, Spyros

2009-01-01

Background: In recent years, Asian Americans have been consistently described as a model minority. The high levels of educational achievement and educational attainment are the main determinants for identifying Asian Americans as a model minority. Nonetheless, only a few studies have examined empirically the accomplishments of Asian Americans, and…
An investigation of the speeding-related crash designation through crash narrative reviews sampled via logistic regression.

PubMed

Fitzpatrick, Cole D; Rakasi, Saritha; Knodler, Michael A

2017-01-01

Speed is one of the most important factors in traffic safety as higher speeds are linked to increased crash risk and higher injury severities. Nearly a third of fatal crashes in the United States are designated as "speeding-related", which is defined as either "the driver behavior of exceeding the posted speed limit or driving too fast for conditions." While many studies have utilized the speeding-related designation in safety analyses, no studies have examined the underlying accuracy of this designation. Herein, we investigate the speeding-related crash designation through the development of a series of logistic regression models that were derived from the established speeding-related crash typologies and validated using a blind review, by multiple researchers, of 604 crash narratives. The developed logistic regression model accurately identified crashes which were not originally designated as speeding-related but had crash narratives that suggested speeding as a causative factor. Only 53.4% of crashes designated as speeding-related contained narratives which described speeding as a causative factor. Further investigation of these crashes revealed that the driver contributing code (DCC) of "driving too fast for conditions" was being used in three separate situations. Additionally, this DCC was also incorrectly used when "exceeding the posted speed limit" would likely have been a more appropriate designation. Finally, it was determined that the responding officer only utilized one DCC in 82% of crashes not designated as speeding-related but contained a narrative indicating speed as a contributing causal factor. The use of logistic regression models based upon speeding-related crash typologies offers a promising method by which all possible speeding-related crashes could be identified. Published by Elsevier Ltd.
Developing a stroke severity index based on administrative data was feasible using data mining techniques.

PubMed

Sung, Sheng-Feng; Hsieh, Cheng-Yang; Kao Yang, Yea-Huei; Lin, Huey-Juan; Chen, Chih-Hung; Chen, Yu-Wei; Hu, Ya-Han

2015-11-01

Case-mix adjustment is difficult for stroke outcome studies using administrative data. However, relevant prescription, laboratory, procedure, and service claims might be surrogates for stroke severity. This study proposes a method for developing a stroke severity index (SSI) by using administrative data. We identified 3,577 patients with acute ischemic stroke from a hospital-based registry and analyzed claims data with plenty of features. Stroke severity was measured using the National Institutes of Health Stroke Scale (NIHSS). We used two data mining methods and conventional multiple linear regression (MLR) to develop prediction models, comparing the model performance according to the Pearson correlation coefficient between the SSI and the NIHSS. We validated these models in four independent cohorts by using hospital-based registry data linked to a nationwide administrative database. We identified seven predictive features and developed three models. The k-nearest neighbor model (correlation coefficient, 0.743; 95% confidence interval: 0.737, 0.749) performed slightly better than the MLR model (0.742; 0.736, 0.747), followed by the regression tree model (0.737; 0.731, 0.742). In the validation cohorts, the correlation coefficients were between 0.677 and 0.725 for all three models. The claims-based SSI enables adjusting for disease severity in stroke studies using administrative data. Copyright © 2015 Elsevier Inc. All rights reserved.
[Associated factors in newborns with intrauterine growth retardation].

PubMed

Thompson-Chagoyán, Oscar C; Vega-Franco, Leopoldo

2008-01-01

To identify the risk factors implicated in the intrauterine growth retardation (IUGR) of neonates born in a social security institution. Case controls design study in 376 neonates: 188 with IUGR (weight < 10 percentile) and 188 without IUGR. When they born, information about 30 variables of risk for IUGR were obtained from mothers. Risk analysis and logistical regression (stepwise) were used. Odds ratios were significant for 12 of the variables. The model obtains by stepwise regression included: weight gain at pregnancy, prenatal care attendance, toxemia, chocolate ingestion, father's weight, and the environmental house. Must of the variables included in the model are related to socioeconomic disadvantages related to the risk of RCIU in the population.
On the interest of combining an analog model to a regression model for the adaptation of the downscaling link. Application to probabilistic prediction of precipitation over France.

NASA Astrophysics Data System (ADS)

Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

2016-04-01

Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the considered region. Finally, even with such an adaptive predictor identification, the downscaling link appears to be robust: for a same prediction day, predictors selected for different locations of a given region are similar and the regression parameters are consistent within the region of interest.
Potential for Bias When Estimating Critical Windows for Air Pollution in Children's Health.

PubMed

Wilson, Ander; Chiu, Yueh-Hsiu Mathilda; Hsu, Hsiao-Hsien Leon; Wright, Robert O; Wright, Rosalind J; Coull, Brent A

2017-12-01

Evidence supports an association between maternal exposure to air pollution during pregnancy and children's health outcomes. Recent interest has focused on identifying critical windows of vulnerability. An analysis based on a distributed lag model (DLM) can yield estimates of a critical window that are different from those from an analysis that regresses the outcome on each of the 3 trimester-average exposures (TAEs). Using a simulation study, we assessed bias in estimates of critical windows obtained using 3 regression approaches: 1) 3 separate models to estimate the association with each of the 3 TAEs; 2) a single model to jointly estimate the association between the outcome and all 3 TAEs; and 3) a DLM. We used weekly fine-particulate-matter exposure data for 238 births in a birth cohort in and around Boston, Massachusetts, and a simulated outcome and time-varying exposure effect. Estimates using separate models for each TAE were biased and identified incorrect windows. This bias arose from seasonal trends in particulate matter that induced correlation between TAEs. Including all TAEs in a single model reduced bias. DLM produced unbiased estimates and added flexibility to identify windows. Analysis of body mass index z score and fat mass in the same cohort highlighted inconsistent estimates from the 3 methods. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Serum eotaxin-1 is increased in extremely-low-birth-weight infants with bronchopulmonary dysplasia or death.

PubMed

Kandasamy, Jegen; Roane, Claire; Szalai, Alexander; Ambalavanan, Namasivayam

2015-11-01

Early systemic inflammation in extremely-low-birth-weight (ELBW) infants is associated with an increased risk of bronchopulmonary dysplasia (BPD). Our objective was to identify circulating biomarkers and develop prediction models for BPD/death soon after birth. Blood samples from postnatal day 1 were analyzed for C-reactive protein (CRP) by enzyme-linked immunosorbent assay and for 39 cytokines/chemokines by a multiplex assay in 152 ELBW infants. The primary outcome was physiologic BPD or death by 36 wk. CRP, cytokines, and clinical variables available at ≤24 h were used for forward stepwise regression and Classification and Regression Tree (CART) analysis to identify predictors of BPD/death. Overall, 24% developed BPD and 35% died or developed BPD. Regression analysis identified birth weight and eotaxin (CCL11) as the two most significant variables. CART identified FiO2 at 24 h (11% BPD/death if FiO2 ≤28%, 49% if >28%) and eotaxin in infants with FiO2 > 28% (29% BPD/death if eotaxin was ≤84 pg/ml; 65% if >84) as variables most associated with outcome. Eotaxin measured on the day of birth is useful for identifying ELBW infants at risk of BPD/death. Further investigation is required to determine if eotaxin is involved in lung injury and pathogenesis of BPD.
Selection of higher order regression models in the analysis of multi-factorial transcription data.

PubMed

Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim

2014-01-01

Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Transport modeling and multivariate adaptive regression splines for evaluating performance of ASR systems in freshwater aquifers

NASA Astrophysics Data System (ADS)

Forghani, Ali; Peralta, Richard C.

2017-10-01

The study presents a procedure using solute transport and statistical models to evaluate the performance of aquifer storage and recovery (ASR) systems designed to earn additional water rights in freshwater aquifers. The recovery effectiveness (REN) index quantifies the performance of these ASR systems. REN is the proportion of the injected water that the same ASR well can recapture during subsequent extraction periods. To estimate REN for individual ASR wells, the presented procedure uses finely discretized groundwater flow and contaminant transport modeling. Then, the procedure uses multivariate adaptive regression splines (MARS) analysis to identify the significant variables affecting REN, and to identify the most recovery-effective wells. Achieving REN values close to 100% is the desire of the studied 14-well ASR system operator. This recovery is feasible for most of the ASR wells by extracting three times the injectate volume during the same year as injection. Most of the wells would achieve RENs below 75% if extracting merely the same volume as they injected. In other words, recovering almost all the same water molecules that are injected requires having a pre-existing water right to extract groundwater annually. MARS shows that REN most significantly correlates with groundwater flow velocity, or hydraulic conductivity and hydraulic gradient. MARS results also demonstrate that maximizing REN requires utilizing the wells located in areas with background Darcian groundwater velocities less than 0.03 m/d. The study also highlights the superiority of MARS over regular multiple linear regressions to identify the wells that can provide the maximum REN. This is the first reported application of MARS for evaluating performance of an ASR system in fresh water aquifers.
Correlates and predictors of missed nursing care in hospitals.

PubMed

Bragadóttir, Helga; Kalisch, Beatrice J; Tryggvadóttir, Gudný Bergthora

2017-06-01

To identify the contribution of hospital, unit, staff characteristics, staffing adequacy and teamwork to missed nursing care in Iceland hospitals. A recently identified quality indicator for nursing care and patient safety is missed nursing care defined as any standard, required nursing care omitted or significantly delayed, indicating an error of omission. Former studies point to contributing factors to missed nursing care regarding hospital, unit and staff characteristics, perceptions of staffing adequacy as well as nursing teamwork, displayed in the Missed Nursing Care Model. This was a quantitative cross-sectional survey study. The samples were all registered nurses and practical nurses (n = 864) working on 27 medical, surgical and intensive care inpatient units in eight hospitals throughout Iceland. Response rate was 69·3%. Data were collected in March-April 2012 using the combined MISSCARE Survey-Icelandic and the Nursing Teamwork Survey-Icelandic. Descriptive, correlational and regression statistics were used for data analysis. Missed nursing care was significantly related to hospital and unit type, participants' age and role and their perception of adequate staffing and level of teamwork. The multiple regression testing of Model 1 indicated unit type, role, age and staffing adequacy to predict 16% of the variance in missed nursing care. Controlling for unit type, role, age and perceptions of staffing adequacy, the multiple regression testing of Model 2 showed that nursing teamwork predicted an additional 14% of the variance in missed nursing care. The results shed light on the correlates and predictors of missed nursing care in hospitals. This study gives direction as to the development of strategies for decreasing missed nursing care, including ensuring appropriate staffing levels and enhanced teamwork. By identifying contributing factors to missed nursing care, appropriate interventions can be developed and tested. © 2016 John Wiley & Sons Ltd.
Geographic dimensions of heat-related mortality in seven U.S. cities.

PubMed

Hondula, David M; Davis, Robert E; Saha, Michael V; Wegner, Carleigh R; Veazey, Lindsay M

2015-04-01

Spatially targeted interventions may help protect the public when extreme heat occurs. Health outcome data are increasingly being used to map intra-urban variability in heat-health risks, but there has been little effort to compare patterns and risk factors between cities. We sought to identify places within large metropolitan areas where the mortality rate is highest on hot summer days and determine if characteristics of high-risk areas are consistent from one city to another. A Poisson regression model was adapted to quantify temperature-mortality relationships at the postal code scale based on 2.1 million records of daily all-cause mortality counts from seven U.S. cities. Multivariate spatial regression models were then used to determine the demographic and environmental variables most closely associated with intra-city variability in risk. Significant mortality increases on extreme heat days were confined to 12-44% of postal codes comprising each city. Places with greater risk had more developed land, young, elderly, and minority residents, and lower income and educational attainment, but the key explanatory variables varied from one city to another. Regression models accounted for 14-34% of the spatial variability in heat-related mortality. The results emphasize the need for public health plans for heat to be locally tailored and not assume that pre-identified vulnerability indicators are universally applicable. As known risk factors accounted for no more than one third of the spatial variability in heat-health outcomes, consideration of health outcome data is important in efforts to identify and protect residents of the places where the heat-related health risks are the highest. Copyright © 2015 Elsevier Inc. All rights reserved.
Estimating procedure times for surgeries by determining location parameters for the lognormal model.

PubMed

Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

2004-05-01

We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

Length bias correction in gene ontology enrichment analysis using logistic regression.

PubMed

Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

2012-01-01

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Characterization and evaluation of controls on post-fire streamflow response across western US watersheds

NASA Astrophysics Data System (ADS)

Saxe, Samuel; Hogue, Terri S.; Hay, Lauren

2018-02-01

This research investigates the impact of wildfires on watershed flow regimes, specifically focusing on evaluation of fire events within specified hydroclimatic regions in the western United States, and evaluating the impact of climate and geophysical variables on response. Eighty-two watersheds were identified with at least 10 years of continuous pre-fire daily streamflow records and 5 years of continuous post-fire daily flow records. Percent change in annual runoff ratio, low flows, high flows, peak flows, number of zero flow days, baseflow index, and Richards-Baker flashiness index were calculated for each watershed using pre- and post-fire periods. Independent variables were identified for each watershed and fire event, including topographic, vegetation, climate, burn severity, percent area burned, and soils data. Results show that low flows, high flows, and peak flows increase in the first 2 years following a wildfire and decrease over time. Relative response was used to scale response variables with the respective percent area of watershed burned in order to compare regional differences in watershed response. To account for variability in precipitation events, runoff ratio was used to compare runoff directly to PRISM precipitation estimates. To account for regional differences in climate patterns, watersheds were divided into nine regions, or clusters, through k-means clustering using climate data, and regression models were produced for watersheds grouped by total area burned. Watersheds in Cluster 9 (eastern California, western Nevada, Oregon) demonstrate a small negative response to observed flow regimes after fire. Cluster 8 watersheds (coastal California) display the greatest flow responses, typically within the first year following wildfire. Most other watersheds show a positive mean relative response. In addition, simple regression models show low correlation between percent watershed burned and streamflow response, implying that other watershed factors strongly influence response. Spearman correlation identified NDVI, aridity index, percent of a watershed's precipitation that falls as rain, and slope as being positively correlated with post-fire streamflow response. This metric also suggested a negative correlation between response and the soil erodibility factor, watershed area, and percent low burn severity. Regression models identified only moderate burn severity and watershed area as being consistently positively/negatively correlated, respectively, with response. The random forest model identified only slope and percent area burned as significant watershed parameters controlling response. Results will help inform post-fire runoff management decisions by helping to identify expected changes to flow regimes, as well as facilitate parameterization for model application in burned watersheds.
Perceptions and Efficacy of Flight Operational Quality Assurance (FOQA) Programs Among Small-scale Operators

DTIC Science & Technology

2012-01-01

regressive Integrated Moving Average ( ARIMA ) model for the data, eliminating the need to identify an appropriate model through trial and error alone...06 .11 13.67 16 .62 16 .14 .11 8.06 16 .95 * Based on the asymptotic chi-square approximation. 8 In general, ARIMA models address three...performance standards and measurement processes and a prevailing climate of organizational trust were important factors. Unfortunately, uneven
Source Region Identification Using Kernel Smoothing

EPA Science Inventory

As described in this paper, Nonparametric Wind Regression is a source-to-receptor source apportionment model that can be used to identify and quantify the impact of possible source regions of pollutants as defined by wind direction sectors. It is described in detail with an exam...
Atmospheric concentrations, sources and gas-particle partitioning of PAHs in Beijing after the 29th Olympic Games.

PubMed

Ma, Wan-Li; Sun, De-Zhi; Shen, Wei-Guo; Yang, Meng; Qi, Hong; Liu, Li-Yan; Shen, Ji-Min; Li, Yi-Fan

2011-07-01

A comprehensive sampling campaign was carried out to study atmospheric concentration of polycyclic aromatic hydrocarbons (PAHs) in Beijing and to evaluate the effectiveness of source control strategies in reducing PAHs pollution after the 29th Olympic Games. The sub-cooled liquid vapor pressure (logP(L)(o))-based model and octanol-air partition coefficient (K(oa))-based model were applied based on each seasonal dateset. Regression analysis among log K(P), logP(L)(o) and log K(oa) exhibited high significant correlations for four seasons. Source factors were identified by principle component analysis and contributions were further estimated by multiple linear regression. Pyrogenic sources and coke oven emission were identified as major sources for both the non-heating and heating seasons. As compared with literatures, the mean PAH concentrations before and after the 29th Olympic Games were reduced by more than 60%, indicating that the source control measures were effective for reducing PAHs pollution in Beijing. Copyright © 2011 Elsevier Ltd. All rights reserved.
Intimate partner violence and anxiety disorders in pregnancy: the importance of vocational training of the nursing staff in facing them1

PubMed Central

Fonseca-Machado, Mariana de Oliveira; Monteiro, Juliana Cristina dos Santos; Haas, Vanderlei José; Abrão, Ana Cristina Freitas de Vilhena; Gomes-Sponholz, Flávia

2015-01-01

Objective: to identify the relationship between posttraumatic stress disorder, trait and state anxiety, and intimate partner violence during pregnancy. Method: observational, cross-sectional study developed with 358 pregnant women. The Posttraumatic Stress Disorder Checklist - Civilian Version was used, as well as the State-Trait Anxiety Inventory and an adapted version of the instrument used in the World Health Organization Multi-country Study on Women's Health and Domestic Violence. Results: after adjusting to the multiple logistic regression model, intimate partner violence, occurred during pregnancy, was associated with the indication of posttraumatic stress disorder. The adjusted multiple linear regression models showed that the victims of violence, in the current pregnancy, had higher symptom scores of trait and state anxiety than non-victims. Conclusion: recognizing the intimate partner violence as a clinically relevant and identifiable risk factor for the occurrence of anxiety disorders during pregnancy can be a first step in the prevention thereof. PMID:26487135
ANCA-Associated Glomerulonephritis: Risk Factors for Renal Relapse.

PubMed

Göçeroğlu, Arda; Berden, Annelies E; Fiocco, Marta; Floßmann, Oliver; Westman, Kerstin W; Ferrario, Franco; Gaskin, Gill; Pusey, Charles D; Hagen, E Christiaan; Noël, Laure-Hélène; Rasmussen, Niels; Waldherr, Rüdiger; Walsh, Michael; Bruijn, Jan A; Jayne, David R W; Bajema, Ingeborg M

2016-01-01

Relapse in ANCA-associated vasculitis (AAV) has been studied previously, but there are few studies on renal relapse in particular. Identifying patients at high risk of renal relapse may aid in optimizing clinical management. We investigated which clinical and histological parameters are risk factors for renal relapse in ANCA-associated glomerulonephritis (AAGN). Patients (n = 174) were newly diagnosed and had mild-moderate or severe renal involvement. Data were derived from two trials of the European Vasculitis Society: MEPEX and CYCAZAREM. The Cox regression model was used to identify parameters increasing the instantaneous risk (= rate) of renal relapse (useful for instant clinical decisions). For identifying predictors of renal relapse during follow-up, we used Fine & Gray's regression model. Competing events were end-stage renal failure and death. The cumulative incidence of renal relapse at 5 years was 9.5% (95% CI: 4.8-14.3%). In the Cox model, sclerotic class AAGN increased the instantaneous risk of renal relapse. In Fine & Gray's model, the absence of interstitial infiltrates at diagnosis was predictive for renal relapse. In this study we used two different models to identify possible relationships between clinical and histopathological parameters at time of diagnosis of AAV with the risk of experiencing renal relapse. Sclerotic class AAGN increased the instantaneous risk of renal relapse. This association is most likely due to the high proportion of sclerosed glomeruli reducing the compensatory capacity. The absence of interstitial infiltrates increased the risk of renal relapse which is a warning sign that patients with a relatively benign onset of disease may also be prone to renal relapse. Renal relapses occurring in patients with sclerotic class AAGN and renal relapses occurring in patients without interstitial infiltrates were mutually exclusive, which may indicate that they are essentially different.
ANCA-Associated Glomerulonephritis: Risk Factors for Renal Relapse

PubMed Central

Göçeroğlu, Arda; Berden, Annelies E.; Fiocco, Marta; Floßmann, Oliver; Westman, Kerstin W.; Ferrario, Franco; Gaskin, Gill; Pusey, Charles D.; Hagen, E. Christiaan; Noël, Laure-Hélène; Rasmussen, Niels; Waldherr, Rüdiger; Walsh, Michael; Bruijn, Jan A.; Jayne, David R. W.; Bajema, Ingeborg M.

2016-01-01

Relapse in ANCA-associated vasculitis (AAV) has been studied previously, but there are few studies on renal relapse in particular. Identifying patients at high risk of renal relapse may aid in optimizing clinical management. We investigated which clinical and histological parameters are risk factors for renal relapse in ANCA-associated glomerulonephritis (AAGN). Patients (n = 174) were newly diagnosed and had mild–moderate or severe renal involvement. Data were derived from two trials of the European Vasculitis Society: MEPEX and CYCAZAREM. The Cox regression model was used to identify parameters increasing the instantaneous risk (= rate) of renal relapse (useful for instant clinical decisions). For identifying predictors of renal relapse during follow-up, we used Fine & Gray’s regression model. Competing events were end-stage renal failure and death. The cumulative incidence of renal relapse at 5 years was 9.5% (95% CI: 4.8–14.3%). In the Cox model, sclerotic class AAGN increased the instantaneous risk of renal relapse. In Fine & Gray’s model, the absence of interstitial infiltrates at diagnosis was predictive for renal relapse. In this study we used two different models to identify possible relationships between clinical and histopathological parameters at time of diagnosis of AAV with the risk of experiencing renal relapse. Sclerotic class AAGN increased the instantaneous risk of renal relapse. This association is most likely due to the high proportion of sclerosed glomeruli reducing the compensatory capacity. The absence of interstitial infiltrates increased the risk of renal relapse which is a warning sign that patients with a relatively benign onset of disease may also be prone to renal relapse. Renal relapses occurring in patients with sclerotic class AAGN and renal relapses occurring in patients without interstitial infiltrates were mutually exclusive, which may indicate that they are essentially different. PMID:27973575
Modeling the probability of giving birth at health institutions among pregnant women attending antenatal care in West Shewa Zone, Oromia, Ethiopia: a cross sectional study.

PubMed

Dida, Nagasa; Birhanu, Zewdie; Gerbaba, Mulusew; Tilahun, Dejen; Morankar, Sudhakar

2014-06-01

Although ante natal care and institutional delivery is effective means for reducing maternal morbidity and mortality, the probability of giving birth at health institutions among ante natal care attendants has not been modeled in Ethiopia. Therefore, the objective of this study was to model predictors of giving birth at health institutions among expectant mothers following antenatal care. Facility based cross sectional study design was conducted among 322 consecutively selected mothers who were following ante natal care in two districts of West Shewa Zone, Oromia Regional State, Ethiopia. Participants were proportionally recruited from six health institutions. The data were analyzed using SPSS version 17.0. Multivariable logistic regression was employed to develop the prediction model. The final regression model had good discrimination power (89.2%), optimum sensitivity (89.0%) and specificity (80.0%) to predict the probability of giving birth at health institutions. Accordingly, self efficacy (beta=0.41), perceived barrier (beta=-0.31) and perceived susceptibility (beta=0.29) were significantly predicted the probability of giving birth at health institutions. The present study showed that logistic regression model has predicted the probability of giving birth at health institutions and identified significant predictors which health care providers should take into account in promotion of institutional delivery.
The Colorectal Cancer Mortality-to-Incidence Ratio as an Indicator of Global Cancer Screening and Care

PubMed Central

Sunkara, Vasu; Hébert, James R.

2015-01-01

BACKGROUND Disparities in cancer screening, incidence, treatment, and survival are worsening globally. The mortality-to-incidence ratio (MIR) has been used previously to evaluate such disparities. METHODS The MIR for colorectal cancer is calculated for all Organisation for Economic Cooperation and Development (OECD) countries using the 2012 GLOBOCAN incidence and mortality statistics. Health system rankings were obtained from the World Health Organization. Two linear regression models were fit with the MIR as the dependent variable and health system ranking as the independent variable; one included all countries and one model had the “divergents” removed. RESULTS The regression model for all countries explained 24% of the total variance in the MIR. Nine countries were found to have regression-calculated MIRs that differed from the actual MIR by >20%. Countries with lower-than-expected MIRs were found to have strong national health systems characterized by formal colorectal cancer screening programs. Conversely, countries with higher-than-expected MIRs lack screening programs. When these divergent points were removed from the data set, the recalculated regression model explained 60% of the total variance in the MIR. CONCLUSIONS The MIR proved useful for identifying disparities in cancer screening and treatment internationally. It has potential as an indicator of the long-term success of cancer surveillance programs and may be extended to other cancer types for these purposes. PMID:25572676
Raman spectroscopy based investigation of molecular changes associated with an early stage of dengue virus infection

NASA Astrophysics Data System (ADS)

Bilal, Maria; Bilal, Muhammad; Saleem, Muhammad; Khurram, Muhammad; Khan, Saranjam; Ullah, Rahat; Ali, Hina; Ahmed, Mushtaq; Shahzada, Shaista; Ullah Khan, Ehsan

2017-04-01

Raman spectroscopy based investigations of the molecular changes associated with an early stage of dengue virus infection (DENV) using a partial least squares (PLS) regression model is presented. This study is based on non-structural protein 1 (NS1) which appears after three days of DENV infection. In total, 39 blood sera samples were collected and divided into two groups. The control group contained samples which were the negative for NS1 and antibodies and the positive group contained those samples in which NS1 is positive and antibodies were negative. Out of 39 samples, 29 Raman spectra were used for the model development while the remaining 10 were kept hidden for blind testing of the model. PLS regression yielded a vector of regression coefficients as a function of Raman shift, which were analyzed. Cytokines in the region 775-875 cm-1, lectins at 1003, 1238, 1340, 1449 and 1672 cm-1, DNA in the region 1040-1140 cm-1 and alpha and beta structures of proteins in the region 933-967 cm-1 have been identified in the regression vector for their role in an early stage of DENV infection. Validity of the model was established by its R-square value of 0.891. Sensitivity, specificity and accuracy were 100% each and the area under the receiver operator characteristic curve was found to be 1.
The colorectal cancer mortality-to-incidence ratio as an indicator of global cancer screening and care.

PubMed

Sunkara, Vasu; Hébert, James R

2015-05-15

Disparities in cancer screening, incidence, treatment, and survival are worsening globally. The mortality-to-incidence ratio (MIR) has been used previously to evaluate such disparities. The MIR for colorectal cancer is calculated for all Organisation for Economic Cooperation and Development (OECD) countries using the 2012 GLOBOCAN incidence and mortality statistics. Health system rankings were obtained from the World Health Organization. Two linear regression models were fit with the MIR as the dependent variable and health system ranking as the independent variable; one included all countries and one model had the "divergents" removed. The regression model for all countries explained 24% of the total variance in the MIR. Nine countries were found to have regression-calculated MIRs that differed from the actual MIR by >20%. Countries with lower-than-expected MIRs were found to have strong national health systems characterized by formal colorectal cancer screening programs. Conversely, countries with higher-than-expected MIRs lack screening programs. When these divergent points were removed from the data set, the recalculated regression model explained 60% of the total variance in the MIR. The MIR proved useful for identifying disparities in cancer screening and treatment internationally. It has potential as an indicator of the long-term success of cancer surveillance programs and may be extended to other cancer types for these purposes. © 2015 American Cancer Society.
Occlusal factors are not related to self-reported bruxism.

PubMed

Manfredini, Daniele; Visscher, Corine M; Guarda-Nardini, Luca; Lobbezoo, Frank

2012-01-01

To estimate the contribution of various occlusal features of the natural dentition that may identify self-reported bruxers compared to nonbruxers. Two age- and sex-matched groups of self-reported bruxers (n = 67) and self-reported nonbruxers (n = 75) took part in the study. For each patient, the following occlusal features were clinically assessed: retruded contact position (RCP) to intercuspal contact position (ICP) slide length (< 2 mm was considered normal), vertical overlap (< 0 mm was considered an anterior open bite; > 4 mm, a deep bite), horizontal overlap (> 4 mm was considered a large horizontal overlap), incisor dental midline discrepancy (< 2 mm was considered normal), and the presence of a unilateral posterior crossbite, mediotrusive interferences, and laterotrusive interferences. A multiple logistic regression model was used to identify the significant associations between the assessed occlusal features (independent variables) and self-reported bruxism (dependent variable). Accuracy values to predict self-reported bruxism were unacceptable for all occlusal variables. The only variable remaining in the final regression model was laterotrusive interferences (P = .030). The percentage of explained variance for bruxism by the final multiple regression model was 4.6%. This model including only one occlusal factor showed low positive (58.1%) and negative predictive values (59.7%), thus showing a poor accuracy to predict the presence of self-reported bruxism (59.2%). This investigation suggested that the contribution of occlusion to the differentiation between bruxers and nonbruxers is negligible. This finding supports theories that advocate a much diminished role for peripheral anatomical-structural factors in the pathogenesis of bruxism.
Predicting Depression among Patients with Diabetes Using Longitudinal Data. A Multilevel Regression Model.

PubMed

Jin, H; Wu, S; Vidyanti, I; Di Capua, P; Wu, B

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Depression is a common and often undiagnosed condition for patients with diabetes. It is also a condition that significantly impacts healthcare outcomes, use, and cost as well as elevating suicide risk. Therefore, a model to predict depression among diabetes patients is a promising and valuable tool for providers to proactively assess depressive symptoms and identify those with depression. This study seeks to develop a generalized multilevel regression model, using a longitudinal data set from a recent large-scale clinical trial, to predict depression severity and presence of major depression among patients with diabetes. Severity of depression was measured by the Patient Health Questionnaire PHQ-9 score. Predictors were selected from 29 candidate factors to develop a 2-level Poisson regression model that can make population-average predictions for all patients and subject-specific predictions for individual patients with historical records. Newly obtained patient records can be incorporated with historical records to update the prediction model. Root-mean-square errors (RMSE) were used to evaluate predictive accuracy of PHQ-9 scores. The study also evaluated the classification ability of using the predicted PHQ-9 scores to classify patients as having major depression. Two time-invariant and 10 time-varying predictors were selected for the model. Incorporating historical records and using them to update the model may improve both predictive accuracy of PHQ-9 scores and classification ability of the predicted scores. Subject-specific predictions (for individual patients with historical records) achieved RMSE about 4 and areas under the receiver operating characteristic (ROC) curve about 0.9 and are better than population-average predictions. The study developed a generalized multilevel regression model to predict depression and demonstrated that using generalized multilevel regression based on longitudinal patient records can achieve high predictive ability.
Multicollinearity in spatial genetics: separating the wheat from the chaff using commonality analyses.

PubMed

Prunier, J G; Colyn, M; Legendre, X; Nimon, K F; Flamand, M C

2015-01-01

Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis (CA), a detailed variance-partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. © 2014 John Wiley & Sons Ltd.
Fish habitat regression under water scarcity scenarios in the Douro River basin

NASA Astrophysics Data System (ADS)

Segurado, Pedro; Jauch, Eduardo; Neves, Ramiro; Ferreira, Teresa

2015-04-01

Climate change will predictably alter hydrological patterns and processes at the catchment scale, with impacts on habitat conditions for fish. The main goals of this study are to identify the stream reaches that will undergo more pronounced flow reduction under different climate change scenarios and to assess which fish species will be more affected by the consequent regression of suitable habitats. The interplay between changes in flow and temperature and the presence of transversal artificial obstacles (dams and weirs) is analysed. The results will contribute to river management and impact mitigation actions under climate change. This study was carried out in the Tâmega catchment of the Douro basin. A set of 29 Hydrological, climatic, and hydrogeomorphological variables were modelled using a water modelling system (MOHID), based on meteorological data recorded monthly between 2008 and 2014. The same variables were modelled considering future climate change scenarios. The resulting variables were used in empirical habitat models of a set of key species (brown trout Salmo trutta fario, barbell Barbus bocagei, and nase Pseudochondrostoma duriense) using boosted regression trees. The stream segments between tributaries were used as spatial sampling units. Models were developed for the whole Douro basin using 401 fish sampling sites, although the modelled probabilities of species occurrence for each stream segment were predicted only for the Tâmega catchment. These probabilities of occurrence were used to classify stream segments into suitable and unsuitable habitat for each fish species, considering the future climate change scenario. The stream reaches that were predicted to undergo longer flow interruptions were identified and crossed with the resulting predictive maps of habitat suitability to compute the total area of habitat loss per species. Among the target species, the brown trout was predicted to be the most sensitive to habitat regression due to the interplay of flow reduction, increase of temperature and transversal barriers. This species is therefore a good indicator of climate change impacts in rivers and therefore we recommend using this species as a target of monitoring programs to be implemented in the context of climate change adaptation strategies.
Impact of volunteer-related and methodology-related factors on the reproducibility of brachial artery flow-mediated vasodilation: analysis of 672 individual repeated measurements.

PubMed

van Mil, Anke C C M; Greyling, Arno; Zock, Peter L; Geleijnse, Johanna M; Hopman, Maria T; Mensink, Ronald P; Reesink, Koen D; Green, Daniel J; Ghiadoni, Lorenzo; Thijssen, Dick H

2016-09-01

Brachial artery flow-mediated dilation (FMD) is a popular technique to examine endothelial function in humans. Identifying volunteer and methodological factors related to variation in FMD is important to improve measurement accuracy and applicability. Volunteer-related and methodology-related parameters were collected in 672 volunteers from eight affiliated centres worldwide who underwent repeated measures of FMD. All centres adopted contemporary expert-consensus guidelines for FMD assessment. After calculating the coefficient of variation (%) of the FMD for each individual, we constructed quartiles (n = 168 per quartile). Based on two regression models (volunteer-related factors and methodology-related factors), statistically significant components of these two models were added to a final regression model (calculated as β-coefficient and R). This allowed us to identify factors that independently contributed to the variation in FMD%. Median coefficient of variation was 17.5%, with healthy volunteers demonstrating a coefficient of variation 9.3%. Regression models revealed age (β = 0.248, P < 0.001), hypertension (β = 0.104, P < 0.001), dyslipidemia (β = 0.331, P < 0.001), time between measurements (β = 0.318, P < 0.001), lab experience (β = -0.133, P < 0.001) and baseline FMD% (β = 0.082, P < 0.05) as contributors to the coefficient of variation. After including all significant factors in the final model, we found that time between measurements, hypertension, baseline FMD% and lab experience with FMD independently predicted brachial artery variability (total R = 0.202). Although FMD% showed good reproducibility, larger variation was observed in conditions with longer time between measurements, hypertension, less experience and lower baseline FMD%. Accounting for these factors may improve FMD% variability.
[Bibliometrics and visualization analysis of land use regression models in ambient air pollution research].

PubMed

Zhang, Y J; Zhou, D H; Bai, Z P; Xue, F X

2018-02-10

Objective: To quantitatively analyze the current status and development trends regarding the land use regression (LUR) models on ambient air pollution studies. Methods: Relevant literature from the PubMed database before June 30, 2017 was analyzed, using the Bibliographic Items Co-occurrence Matrix Builder (BICOMB 2.0). Keywords co-occurrence networks, cluster mapping and timeline mapping were generated, using the CiteSpace 5.1.R5 software. Relevant literature identified in three Chinese databases was also reviewed. Results: Four hundred sixty four relevant papers were retrieved from the PubMed database. The number of papers published showed an annual increase, in line with the growing trend of the index. Most papers were published in the journal of Environmental Health Perspectives . Results from the Co-word cluster analysis identified five clusters: cluster#0 consisted of birth cohort studies related to the health effects of prenatal exposure to air pollution; cluster#1 referred to land use regression modeling and exposure assessment; cluster#2 was related to the epidemiology on traffic exposure; cluster#3 dealt with the exposure to ultrafine particles and related health effects; cluster#4 described the exposure to black carbon and related health effects. Data from Timeline mapping indicated that cluster#0 and#1 were the main research areas while cluster#3 and#4 were the up-coming hot areas of research. Ninety four relevant papers were retrieved from the Chinese databases with most of them related to studies on modeling. Conclusion: In order to better assess the health-related risks of ambient air pollution, and to best inform preventative public health intervention policies, application of LUR models to environmental epidemiology studies in China should be encouraged.
Collaborative Chronic Care Models for Mental Health Conditions: Cumulative Meta-Analysis and Meta-Regression to Guide Future Research and Implementation

PubMed Central

Grogan-Kaylor, Andrew; Perron, Brian E.; Kilbourne, Amy M.; Woltmann, Emily; Bauer, Mark S.

2013-01-01

Objective Prior meta-analysis indicates that collaborative chronic care models (CCMs) improve mental and physical health outcomes for individuals with mental disorders. This study aimed to investigate the stability of evidence over time and identify patient and intervention factors associated with CCM effects in order to facilitate implementation and sustainability of CCMs in clinical practice. Method We reviewed 53 CCM trials that analyzed depression, mental quality of life (QOL), or physical QOL outcomes. Cumulative meta-analysis and meta-regression were supplemented by descriptive investigations across and within trials. Results Most trials targeted depression in the primary care setting, and cumulative meta-analysis indicated that effect sizes favoring CCM quickly achieved significance for depression outcomes, and more recently achieved significance for mental and physical QOL. Four of six CCM elements (patient self-management support, clinical information systems, system redesign, and provider decision support) were common among reviewed trials, while two elements (healthcare organization support and linkages to community resources) were rare. No single CCM element was statistically associated with the success of the model. Similarly, meta-regression did not identify specific factors associated with CCM effectiveness. Nonetheless, results within individual trials suggest that increased illness severity predicts CCM outcomes. Conclusions Significant CCM trials have been derived primarily from four original CCM elements. Nonetheless, implementing and sustaining this established model will require healthcare organization support. While CCMs have typically been tested as population-based interventions, evidence supports stepped care application to more severely ill individuals. Future priorities include developing implementation strategies to support adoption and sustainability of the model in clinical settings while maximizing fit of this multi-component framework to local contextual factors. PMID:23938600
Computing group cardinality constraint solutions for logistic regression problems.

PubMed

Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M

2017-01-01

We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.

Automatic Classification of Users’ Health Information Need Context: Logistic Regression Analysis of Mouse-Click and Eye-Tracker Data

PubMed Central

Pian, Wenjing; Khoo, Christopher SG

2017-01-01

Background Users searching for health information on the Internet may be searching for their own health issue, searching for someone else’s health issue, or browsing with no particular health issue in mind. Previous research has found that these three categories of users focus on different types of health information. However, most health information websites provide static content for all users. If the three types of user health information need contexts can be identified by the Web application, the search results or information offered to the user can be customized to increase its relevance or usefulness to the user. Objective The aim of this study was to investigate the possibility of identifying the three user health information contexts (searching for self, searching for others, or browsing with no particular health issue in mind) using just hyperlink clicking behavior; using eye-tracking information; and using a combination of eye-tracking, demographic, and urgency information. Predictive models are developed using multinomial logistic regression. Methods A total of 74 participants (39 females and 35 males) who were mainly staff and students of a university were asked to browse a health discussion forum, Healthboards.com. An eye tracker recorded their examining (eye fixation) and skimming (quick eye movement) behaviors on 2 types of screens: summary result screen displaying a list of post headers, and detailed post screen. The following three types of predictive models were developed using logistic regression analysis: model 1 used only the time spent in scanning the summary result screen and reading the detailed post screen, which can be determined from the user’s mouse clicks; model 2 used the examining and skimming durations on each screen, recorded by an eye tracker; and model 3 added user demographic and urgency information to model 2. Results An analysis of variance (ANOVA) analysis found that users’ browsing durations were significantly different for the three health information contexts (P<.001). The logistic regression model 3 was able to predict the user’s type of health information context with a 10-fold cross validation mean accuracy of 84% (62/74), followed by model 2 at 73% (54/74) and model 1 at 71% (52/78). In addition, correlation analysis found that particular browsing durations were highly correlated with users’ age, education level, and the urgency of their information need. Conclusions A user’s type of health information need context (ie, searching for self, for others, or with no health issue in mind) can be identified with reasonable accuracy using just user mouse clicks that can easily be detected by Web applications. Higher accuracy can be obtained using Google glass or future computing devices with eye tracking function. PMID:29269342
Aircraft Anomaly Detection Using Performance Models Trained on Fleet Data

NASA Technical Reports Server (NTRS)

Gorinevsky, Dimitry; Matthews, Bryan L.; Martin, Rodney

2012-01-01

This paper describes an application of data mining technology called Distributed Fleet Monitoring (DFM) to Flight Operational Quality Assurance (FOQA) data collected from a fleet of commercial aircraft. DFM transforms the data into aircraft performance models, flight-to-flight trends, and individual flight anomalies by fitting a multi-level regression model to the data. The model represents aircraft flight performance and takes into account fixed effects: flight-to-flight and vehicle-to-vehicle variability. The regression parameters include aerodynamic coefficients and other aircraft performance parameters that are usually identified by aircraft manufacturers in flight tests. Using DFM, the multi-terabyte FOQA data set with half-million flights was processed in a few hours. The anomalies found include wrong values of competed variables, (e.g., aircraft weight), sensor failures and baises, failures, biases, and trends in flight actuators. These anomalies were missed by the existing airline monitoring of FOQA data exceedances.
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].

PubMed

Vanegas, Jairo; Vásquez, Fabián

Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Which Measurement of Blood Pressure Is More Associated With Albuminuria in Patients With Type 2 Diabetes: Central Blood Pressure or Peripheral Blood Pressure?

PubMed

Kitagawa, Noriyuki; Okada, Hiroshi; Tanaka, Muhei; Hashimoto, Yoshitaka; Kimura, Toshihiro; Nakano, Koji; Yamazaki, Masahiro; Hasegawa, Goji; Nakamura, Naoto; Fukui, Michiaki

2016-08-01

The aim of this study was to investigate whether central systolic blood pressure (SBP) was associated with albuminuria, defined as urinary albumin excretion (UAE) ≥30 mg/g creatinine, and, if so, whether the relationship of central SBP with albuminuria was stronger than that of peripheral SBP in patients with type 2 diabetes. The authors performed a cross-sectional study in 294 outpatients with type 2 diabetes. The relationship between peripheral SBP or central SBP and UAE using regression analysis was evaluated, and the odds ratios of peripheral SBP or central SBP were calculated to identify albuminuria using logistic regression model. Moreover, the area under the receiver operating characteristic curve (AUC) of central SBP was compared with that of peripheral SBP to identify albuminuria. Multiple regression analysis demonstrated that peripheral SBP (β=0.255, P<.0001) or central SBP (r=0.227, P<.0001) was associated with UAE. Multiple logistic regression analysis demonstrated that peripheral SBP (odds ratio, 1.029; 95% confidence interval, 1.016-1.043) or central SBP (odds ratio, 1.022; 95% confidence interval, 1.011-1.034) was associated with an increased odds of albuminuria. In addition, AUC of peripheral SBP was significantly greater than that of central SBP to identify albuminuria (P=0.035). Peripheral SBP is superior to central SBP in identifying albuminuria, although both peripheral and central SBP are associated with UAE in patients with type 2 diabetes. © 2016 Wiley Periodicals, Inc.
A Comparison of Rule-based Analysis with Regression Methods in Understanding the Risk Factors for Study Withdrawal in a Pediatric Study.

PubMed

Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai

2016-08-26

Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.
Seasonal forecasting of high wind speeds over Western Europe

NASA Astrophysics Data System (ADS)

Palutikof, J. P.; Holt, T.

2003-04-01

As financial losses associated with extreme weather events escalate, there is interest from end users in the forestry and insurance industries, for example, in the development of seasonal forecasting models with a long lead time. This study uses exceedences of the 90th, 95th, and 99th percentiles of daily maximum wind speed over the period 1958 to present to derive predictands of winter wind extremes. The source data is the 6-hourly NCEP Reanalysis gridded surface wind field. Predictor variables include principal components of Atlantic sea surface temperature and several indices of climate variability, including the NAO and SOI. Lead times of up to a year are considered, in monthly increments. Three regression techniques are evaluated; multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS). PCR and PLS proved considerably superior to MLR with much lower standard errors. PLS was chosen to formulate the predictive model since it offers more flexibility in experimental design and gave slightly better results than PCR. The results indicate that winter windiness can be predicted with considerable skill one year ahead for much of coastal Europe, but that this deteriorates rapidly in the hinterland. The experiment succeeded in highlighting PLS as a very useful method for developing more precise forecasting models, and in identifying areas of high predictability.
Practical Guidance for Conducting Mediation Analysis With Multiple Mediators Using Inverse Odds Ratio Weighting

PubMed Central

Nguyen, Quynh C.; Osypuk, Theresa L.; Schmidt, Nicole M.; Glymour, M. Maria; Tchetgen Tchetgen, Eric J.

2015-01-01

Despite the recent flourishing of mediation analysis techniques, many modern approaches are difficult to implement or applicable to only a restricted range of regression models. This report provides practical guidance for implementing a new technique utilizing inverse odds ratio weighting (IORW) to estimate natural direct and indirect effects for mediation analyses. IORW takes advantage of the odds ratio's invariance property and condenses information on the odds ratio for the relationship between the exposure (treatment) and multiple mediators, conditional on covariates, by regressing exposure on mediators and covariates. The inverse of the covariate-adjusted exposure-mediator odds ratio association is used to weight the primary analytical regression of the outcome on treatment. The treatment coefficient in such a weighted regression estimates the natural direct effect of treatment on the outcome, and indirect effects are identified by subtracting direct effects from total effects. Weighting renders treatment and mediators independent, thereby deactivating indirect pathways of the mediators. This new mediation technique accommodates multiple discrete or continuous mediators. IORW is easily implemented and is appropriate for any standard regression model, including quantile regression and survival analysis. An empirical example is given using data from the Moving to Opportunity (1994–2002) experiment, testing whether neighborhood context mediated the effects of a housing voucher program on obesity. Relevant Stata code (StataCorp LP, College Station, Texas) is provided. PMID:25693776
Forecasting the probability of future groundwater levels declining below specified low thresholds in the conterminous U.S.

USGS Publications Warehouse

Dudley, Robert W.; Hodgkins, Glenn A.; Dickinson, Jesse

2017-01-01

We present a logistic regression approach for forecasting the probability of future groundwater levels declining or maintaining below specific groundwater-level thresholds. We tested our approach on 102 groundwater wells in different climatic regions and aquifers of the United States that are part of the U.S. Geological Survey Groundwater Climate Response Network. We evaluated the importance of current groundwater levels, precipitation, streamflow, seasonal variability, Palmer Drought Severity Index, and atmosphere/ocean indices for developing the logistic regression equations. Several diagnostics of model fit were used to evaluate the regression equations, including testing of autocorrelation of residuals, goodness-of-fit metrics, and bootstrap validation testing. The probabilistic predictions were most successful at wells with high persistence (low month-to-month variability) in their groundwater records and at wells where the groundwater level remained below the defined low threshold for sustained periods (generally three months or longer). The model fit was weakest at wells with strong seasonal variability in levels and with shorter duration low-threshold events. We identified challenges in deriving probabilistic-forecasting models and possible approaches for addressing those challenges.
Unipedal stance testing in the assessment of peripheral neuropathy.

PubMed

Hurvitz, E A; Richardson, J K; Werner, R A

2001-02-01

To define further the relation between unipedal stance testing and peripheral neuropathy. Prospective cohort. Electroneuromyography laboratory of a Veterans Affairs medical center and a university hospital. Ninety-two patients referred for lower extremity electrodiagnostic studies. A standardized history and physical examination designed to detect peripheral neuropathy, 3 trials of unipedal stance, and electrodiagnostic studies. Peripheral neuropathy was identified by electrodiagnostic testing in 32%. These subjects had a significantly shorter (p <.001) unipedal stance time (15.7s, longest of 3 trials) than the patients without peripheral neuropathy (37.1s). Abnormal unipedal stance time (<45s) identified peripheral neuropathy with a sensitivity of 83% and a specificity of 71%, whereas a normal unipedal stance time had a negative predictive value of 90%. Abnormal unipedal stance time was associated with an increased risk of having peripheral neuropathy on univariate analysis (odds ratio = 8.8, 95% confidence interval = 2.5--31), and was the only significant predictor of peripheral neuropathy in the regression model. Aspects of the neurologic examination did not add to the regression model compared with abnormal unipedal stance time. Unipedal stance testing is useful in the clinical setting both to identify and to exclude the presence of peripheral neuropathy.
Effect of temperature and precipitation on salmonellosis cases in South-East Queensland, Australia: an observational study.

PubMed

Stephen, Dimity Maree; Barnett, Adrian Gerard

2016-02-25

Foodborne illnesses in Australia, including salmonellosis, are estimated to cost over $A1.25 billion annually. The weather has been identified as being influential on salmonellosis incidence, as cases increase during summer, however time series modelling of salmonellosis is challenging because outbreaks cause strong autocorrelation. This study assesses whether switching models is an improved method of estimating weather-salmonellosis associations. We analysed weather and salmonellosis in South-East Queensland between 2004 and 2013 using 2 common regression models and a switching model, each with 21-day lags for temperature and precipitation. The switching model best fit the data, as judged by its substantial improvement in deviance information criterion over the regression models, less autocorrelated residuals and control of seasonality. The switching model estimated a 5 °C increase in mean temperature and 10 mm precipitation were associated with increases in salmonellosis cases of 45.4% (95% CrI 40.4%, 50.5%) and 24.1% (95% CrI 17.0%, 31.6%), respectively. Switching models improve on traditional time series models in quantifying weather-salmonellosis associations. A better understanding of how temperature and precipitation influence salmonellosis may identify where interventions can be made to lower the health and economic costs of salmonellosis. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

NASA Astrophysics Data System (ADS)

Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

2014-09-01

Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
A spatial regression procedure for evaluating the relationship between AVHRR-NDVI and climate in the northern Great Plains

USGS Publications Warehouse

Ji, Lei; Peters, Albert J.

2004-01-01

The relationship between vegetation and climate in the grassland and cropland of the northern US Great Plains was investigated with Normalized Difference Vegetation Index (NDVI) (1989–1993) images derived from the Advanced Very High Resolution Radiometer (AVHRR), and climate data from automated weather stations. The relationship was quantified using a spatial regression technique that adjusts for spatial autocorrelation inherent in these data. Conventional regression techniques used frequently in previous studies are not adequate, because they are based on the assumption of independent observations. Six climate variables during the growing season; precipitation, potential evapotranspiration, daily maximum and minimum air temperature, soil temperature, solar irradiation were regressed on NDVI derived from a 10-km weather station buffer. The regression model identified precipitation and potential evapotranspiration as the most significant climatic variables, indicating that the water balance is the most important factor controlling vegetation condition at an annual timescale. The model indicates that 46% and 24% of variation in NDVI is accounted for by climate in grassland and cropland, respectively, indicating that grassland vegetation has a more pronounced response to climate variation than cropland. Other factors contributing to NDVI variation include environmental factors (soil, groundwater and terrain), human manipulation of crops, and sensor variation.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA

NASA Astrophysics Data System (ADS)

Mair, Alan; El-Kadi, Aly I.

2013-10-01

Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (> 1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Identifying pollution sources and predicting urban air quality using ensemble learning methods

NASA Astrophysics Data System (ADS)

Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

2013-12-01

In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

PubMed

Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

2018-04-19

Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
The relationship between biomechanical variables and driving performance during the golf swing.

PubMed

Chu, Yungchien; Sell, Timothy C; Lephart, Scott M

2010-09-01

Swing kinematic and ground reaction force data from 308 golfers were analysed to identify the variables important to driving ball velocity. Regression models were applied at four selected events in the swing. The models accounted for 44-74% of variance in ball velocity. Based on the regression analyses, upper torso-pelvis separation (the X-Factor), delayed release (i.e. the initiation of movement) of the arms and wrists, trunk forward and lateral tilting, and weight-shifting during the swing were significantly related to ball velocity. Our results also verify several general coaching ideas that were considered important to increased ball velocity. The results of this study may serve as both skill and strength training guidelines for golfers.
Systematic analysis of factors associated with progression and regression of ulcerative colitis in 918 patients.

PubMed

Safroneeva, E; Vavricka, S; Fournier, N; Seibold, F; Mottet, C; Nydegger, A; Ezri, J; Straumann, A; Rogler, G; Schoepfer, A M

2015-09-01

Studies that systematically assess change in ulcerative colitis (UC) extent over time in adult patients are scarce. To assess changes in disease extent over time and to evaluate clinical parameters associated with this change. Data from the Swiss IBD cohort study were analysed. We used logistic regression modelling to identify factors associated with a change in disease extent. A total of 918 UC patients (45.3% females) were included. At diagnosis, UC patients presented with the following disease extent: proctitis [199 patients (21.7%)], left-sided colitis [338 patients (36.8%)] and extensive colitis/pancolitis [381 (41.5%)]. During a median disease duration of 9 [4-16] years, progression and regression was documented in 145 patients (15.8%) and 149 patients (16.2%) respectively. In addition, 624 patients (68.0%) had a stable disease extent. The following factors were identified to be associated with disease progression: treatment with systemic glucocorticoids [odds ratio (OR) 1.704, P = 0.025] and calcineurin inhibitors (OR: 2.716, P = 0.005). No specific factors were found to be associated with disease regression. Over a median disease duration of 9 [4-16] years, about two-thirds of UC patients maintained the initial disease extent; the remaining one-third had experienced either progression or regression of the disease extent. © 2015 John Wiley & Sons Ltd.
Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

NASA Astrophysics Data System (ADS)

Denli, H. H.; Koc, Z.

2015-12-01

Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Inner and outer coronary vessel wall segmentation from CCTA using an active contour model with machine learning-based 3D voxel context-aware image force

NASA Astrophysics Data System (ADS)

Sivalingam, Udhayaraj; Wels, Michael; Rempfler, Markus; Grosskopf, Stefan; Suehling, Michael; Menze, Bjoern H.

2016-03-01

In this paper, we present a fully automated approach to coronary vessel segmentation, which involves calcification or soft plaque delineation in addition to accurate lumen delineation, from 3D Cardiac Computed Tomography Angiography data. Adequately virtualizing the coronary lumen plays a crucial role for simulating blood ow by means of fluid dynamics while additionally identifying the outer vessel wall in the case of arteriosclerosis is a prerequisite for further plaque compartment analysis. Our method is a hybrid approach complementing Active Contour Model-based segmentation with an external image force that relies on a Random Forest Regression model generated off-line. The regression model provides a strong estimate of the distance to the true vessel surface for every surface candidate point taking into account 3D wavelet-encoded contextual image features, which are aligned with the current surface hypothesis. The associated external image force is integrated in the objective function of the active contour model, such that the overall segmentation approach benefits from the advantages associated with snakes and from the ones associated with machine learning-based regression alike. This yields an integrated approach achieving competitive results on a publicly available benchmark data collection (Rotterdam segmentation challenge).
Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

PubMed

Shen, Chung-Wei; Chen, Yi-Hau

2018-03-13

We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

Development and validation of a mortality risk model for pediatric sepsis.

PubMed

Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

2017-05-01

Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.
Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure.

PubMed

Senn, Stephen; Graf, Erika; Caputo, Angelika

2007-12-30

Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Development and validation of a mortality risk model for pediatric sepsis

PubMed Central

Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan

2017-01-01

Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310
Development of a real-time crash risk prediction model incorporating the various crash mechanisms across different traffic states.

PubMed

Xu, Chengcheng; Wang, Wei; Liu, Pan; Zhang, Fangwei

2015-01-01

This study aimed to identify the traffic flow variables contributing to crash risks under different traffic states and to develop a real-time crash risk model incorporating the varying crash mechanisms across different traffic states. The crash, traffic, and geometric data were collected on the I-880N freeway in California in 2008 and 2009. This study considered 4 different traffic states in Wu's 4-phase traffic theory. They are free fluid traffic, bunched fluid traffic, bunched congested traffic, and standing congested traffic. Several different statistical methods were used to accomplish the research objective. The preliminary analysis showed that traffic states significantly affected crash likelihood, collision type, and injury severity. Nonlinear canonical correlation analysis (NLCCA) was conducted to identify the underlying phenomena that made certain traffic states more hazardous than others. The results suggested that different traffic states were associated with various collision types and injury severities. The matching of traffic flow characteristics and crash characteristics in NLCCA revealed how traffic states affected traffic safety. The logistic regression analyses showed that the factors contributing to crash risks were quite different across various traffic states. To incorporate the varying crash mechanisms across different traffic states, random parameters logistic regression was used to develop a real-time crash risk model. Bayesian inference based on Markov chain Monte Carlo simulations was used for model estimation. The parameters of traffic flow variables in the model were allowed to vary across different traffic states. Compared with the standard logistic regression model, the proposed model significantly improved the goodness-of-fit and predictive performance. These results can promote a better understanding of the relationship between traffic flow characteristics and crash risks, which is valuable knowledge in the pursuit of improving traffic safety on freeways through the use of dynamic safety management systems.
A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA

USGS Publications Warehouse

Nolan, Bernard T.; Fienen, Michael N.; Lorenz, David L.

2015-01-01

We used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R2 and a version with R2 within one standard error of the maximum (the 1SE model). The former yielded CV training R2 values of 0.94–1.0. Cross-validation testing R2 values indicate predictive performance, and these were 0.22–0.39 for the maximum R2 models and 0.19–0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley.
Studying Individual Differences in Predictability with Gamma Regression and Nonlinear Multilevel Models

ERIC Educational Resources Information Center

Culpepper, Steven Andrew

2010-01-01

Statistical prediction remains an important tool for decisions in a variety of disciplines. An equally important issue is identifying factors that contribute to more or less accurate predictions. The time series literature includes well developed methods for studying predictability and volatility over time. This article develops…
Modeling enterococcus densities measured by quantitative polymerase chain reaction and membrane filtration using environmental conditions at four Great Lakes beaches

EPA Science Inventory

Data collected by the US Environmental Protection Agency (EPA) during the summer months of 2003 and 2004 at four US Great Lakes beaches were analyzed using regression analysis to identify relationships between meteorological, physical water characteristics, and beach characterist...
Predictors of Adolescent Breakfast Consumption: Longitudinal Findings from Project EAT

ERIC Educational Resources Information Center

Bruening, Meg; Larson, Nicole; Story, Mary; Neumark-Sztainer, Dianne; Hannan, Peter

2011-01-01

Objective: To identify predictors of breakfast consumption among adolescents. Methods: Five-year longitudinal study Project EAT (Eating Among Teens). Baseline surveys were completed in Minneapolis-St. Paul schools and by mail at follow-up by youth (n = 800) transitioning from middle to high school. Linear regression models examined associations…
Meta-Analysis: An Introduction Using Regression Models

ERIC Educational Resources Information Center

Rhodes, William

2012-01-01

Research synthesis of evaluation findings is a multistep process. An investigator identifies a research question, acquires the relevant literature, codes findings from that literature, and analyzes the coded data to estimate the average treatment effect and its distribution in a population of interest. The process of estimating the average…
Determination of riverbank erosion probability using Locally Weighted Logistic Regression

NASA Astrophysics Data System (ADS)

Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos

2015-04-01

Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Spatial quantile regression using INLA with applications to childhood overweight in Malawi.

PubMed

Mtambo, Owen P L; Masangwi, Salule J; Kazembe, Lawrence N M

2015-04-01

Analyses of childhood overweight have mainly used mean regression. However, using quantile regression is more appropriate as it provides flexibility to analyse the determinants of overweight corresponding to quantiles of interest. The main objective of this study was to fit a Bayesian additive quantile regression model with structured spatial effects for childhood overweight in Malawi using the 2010 Malawi DHS data. Inference was fully Bayesian using R-INLA package. The significant determinants of childhood overweight ranged from socio-demographic factors such as type of residence to child and maternal factors such as child age and maternal BMI. We observed significant positive structured spatial effects on childhood overweight in some districts of Malawi. We recommended that the childhood malnutrition policy makers should consider timely interventions based on risk factors as identified in this paper including spatial targets of interventions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Probabilistic Forecasting of Surface Ozone with a Novel Statistical Approach

NASA Technical Reports Server (NTRS)

Balashov, Nikolay V.; Thompson, Anne M.; Young, George S.

2017-01-01

The recent change in the Environmental Protection Agency's surface ozone regulation, lowering the surface ozone daily maximum 8-h average (MDA8) exceedance threshold from 75 to 70 ppbv, poses significant challenges to U.S. air quality (AQ) forecasters responsible for ozone MDA8 forecasts. The forecasters, supplied by only a few AQ model products, end up relying heavily on self-developed tools. To help U.S. AQ forecasters, this study explores a surface ozone MDA8 forecasting tool that is based solely on statistical methods and standard meteorological variables from the numerical weather prediction (NWP) models. The model combines the self-organizing map (SOM), which is a clustering technique, with a step wise weighted quadratic regression using meteorological variables as predictors for ozone MDA8. The SOM method identifies different weather regimes, to distinguish between various modes of ozone variability, and groups them according to similarity. In this way, when a regression is developed for a specific regime, data from the other regimes are also used, with weights that are based on their similarity to this specific regime. This approach, regression in SOM (REGiS), yields a distinct model for each regime taking into account both the training cases for that regime and other similar training cases. To produce probabilistic MDA8 ozone forecasts, REGiS weighs and combines all of the developed regression models on the basis of the weather patterns predicted by an NWP model. REGiS is evaluated over the San Joaquin Valley in California and the northeastern plains of Colorado. The results suggest that the model performs best when trained and adjusted separately for an individual AQ station and its corresponding meteorological site.
Heterogeneity in the Relationship of Substance Use to Risky Sexual Behavior Among Justice-Involved Youth: A Regression Mixture Modeling Approach.

PubMed

Schmiege, Sarah J; Bryan, Angela D

2016-04-01

Justice-involved adolescents engage in high levels of risky sexual behavior and substance use, and understanding potential relationships among these constructs is important for effective HIV/STI prevention. A regression mixture modeling approach was used to determine whether subgroups could be identified based on the regression of two indicators of sexual risk (condom use and frequency of intercourse) on three measures of substance use (alcohol, marijuana and hard drugs). Three classes were observed among n = 596 adolescents on probation: none of the substances predicted outcomes for approximately 18 % of the sample; alcohol and marijuana use were predictive for approximately 59 % of the sample, and marijuana use and hard drug use were predictive in approximately 23 % of the sample. Demographic, individual difference, and additional sexual and substance use risk variables were examined in relation to class membership. Findings are discussed in terms of understanding profiles of risk behavior among at-risk youth.
Hydrological modeling of geophysical parameters of arboviral and protozoan disease vectors in Internally Displaced People camps in Gulu, Uganda.

PubMed

Jacob, Benjamin G; Muturi, Ephantus J; Caamano, Erick X; Gunter, James T; Mpanga, Enoch; Ayine, Robert; Okelloonen, Joseph; Nyeko, Jack Pen-Mogi; Shililu, Josephat I; Githure, John I; Regens, James L; Novak, Robert J; Kakoma, Ibulaimu

2008-03-14

The aim of this study was to determine if remotely sensed data and Digital Elevation Model (DEM) can test relationships between Culex quinquefasciatus and Anopheles gambiae s.l. larval habitats and environmental parameters within Internally Displaced People (IDP) campgrounds in Gulu, Uganda. A total of 65 georeferenced aquatic habitats in various IDP camps were studied to compare the larval abundance of Cx. quinquefasciatus and An. gambiae s.l. The aquatic habitat dataset were overlaid onto Land Use Land Cover (LULC) maps retrieved from Landsat imagery with 150 m x 150 m grid cells stratified by levels of drainage. The LULC change was estimated over a period of 14 years. Poisson regression analyses and Moran's I statistics were used to model relationships between larval abundance and environmental predictors. Individual larval habitat data were further evaluated in terms of their covariations with spatial autocorrelation by regressing them on candidate spatial filter eigenvectors. Multispectral QuickBird imagery classification and DEM-based GIS methods were generated to evaluate stream flow direction and accumulation for identification of immature Cx. quinquefasciatus and An. gambiae s.l. and abundance. The main LULC change in urban Gulu IDP camps was non-urban to urban, which included about 71.5 % of the land cover. The regression models indicate that counts of An. gambiae s.l. larvae were associated with shade while Cx. quinquefasciatus were associated with floating vegetation. Moran's I and the General G statistics for mosquito density by species and instars, identified significant clusters of high densities of Anopheles; larvae, however, Culex are not consistently clustered. A stepwise negative binomial regression decomposed the immature An. gambiae s.l. data into empirical orthogonal bases. The data suggest the presence of roughly 11% to 28 % redundant information in the larval count samples. The DEM suggest a positive correlation for Culex (0.24) while for Anopheles there was a negative correlation (-0.23) for a local model distance to stream. These data demonstrate that optical remote sensing; geostatistics and DEMs can be used to identify parameters associated with Culex and Anopheles aquatic habitats.
Prediction of Compressional Wave Velocity Using Regression and Neural Network Modeling and Estimation of Stress Orientation in Bokaro Coalfield, India

NASA Astrophysics Data System (ADS)

Paul, Suman; Ali, Muhammad; Chatterjee, Rima

2018-01-01

Velocity of compressional wave ( V P) of coal and non-coal lithology is predicted from five wells from the Bokaro coalfield (CF), India. Shear sonic travel time logs are not recorded for all wells under the study area. Shear wave velocity ( Vs) is available only for two wells: one from east and other from west Bokaro CF. The major lithologies of this CF are dominated by coal, shaly coal of Barakar formation. This paper focuses on the (a) relationship between Vp and Vs, (b) prediction of Vp using regression and neural network modeling and (c) estimation of maximum horizontal stress from image log. Coal characterizes with low acoustic impedance (AI) as compared to the overlying and underlying strata. The cross-plot between AI and Vp/ Vs is able to identify coal, shaly coal, shale and sandstone from wells in Bokaro CF. The relationship between Vp and Vs is obtained with excellent goodness of fit ( R 2) ranging from 0.90 to 0.93. Linear multiple regression and multi-layered feed-forward neural network (MLFN) models are developed for prediction Vp from two wells using four input log parameters: gamma ray, resistivity, bulk density and neutron porosity. Regression model predicted Vp shows poor fit (from R 2 = 0.28) to good fit ( R 2 = 0.79) with the observed velocity. MLFN model predicted Vp indicates satisfactory to good R2 values varying from 0.62 to 0.92 with the observed velocity. Maximum horizontal stress orientation from a well at west Bokaro CF is studied from Formation Micro-Imager (FMI) log. Breakouts and drilling-induced fractures (DIFs) are identified from the FMI log. Breakout length of 4.5 m is oriented towards N60°W whereas the orientation of DIFs for a cumulative length of 26.5 m is varying from N15°E to N35°E. The mean maximum horizontal stress in this CF is towards N28°E.
Hydrological modeling of geophysical parameters of arboviral and protozoan disease vectors in Internally Displaced People camps in Gulu, Uganda

PubMed Central

Jacob, Benjamin G; Muturi, Ephantus J; Caamano, Erick X; Gunter, James T; Mpanga, Enoch; Ayine, Robert; Okelloonen, Joseph; Nyeko, Jack Pen-Mogi; Shililu, Josephat I; Githure, John I; Regens, James L; Novak, Robert J; Kakoma, Ibulaimu

2008-01-01

Background The aim of this study was to determine if remotely sensed data and Digital Elevation Model (DEM) can test relationships between Culex quinquefasciatus and Anopheles gambiae s.l. larval habitats and environmental parameters within Internally Displaced People (IDP) campgrounds in Gulu, Uganda. A total of 65 georeferenced aquatic habitats in various IDP camps were studied to compare the larval abundance of Cx. quinquefasciatus and An. gambiae s.l. The aquatic habitat dataset were overlaid onto Land Use Land Cover (LULC) maps retrieved from Landsat imagery with 150 m × 150 m grid cells stratified by levels of drainage. The LULC change was estimated over a period of 14 years. Poisson regression analyses and Moran's I statistics were used to model relationships between larval abundance and environmental predictors. Individual larval habitat data were further evaluated in terms of their covariations with spatial autocorrelation by regressing them on candidate spatial filter eigenvectors. Multispectral QuickBird imagery classification and DEM-based GIS methods were generated to evaluate stream flow direction and accumulation for identification of immature Cx. quinquefasciatus and An. gambiae s.l. and abundance. Results The main LULC change in urban Gulu IDP camps was non-urban to urban, which included about 71.5 % of the land cover. The regression models indicate that counts of An. gambiae s.l. larvae were associated with shade while Cx. quinquefasciatus were associated with floating vegetation. Moran's I and the General G statistics for mosquito density by species and instars, identified significant clusters of high densities of Anopheles; larvae, however, Culex are not consistently clustered. A stepwise negative binomial regression decomposed the immature An. gambiae s.l. data into empirical orthogonal bases. The data suggest the presence of roughly 11% to 28 % redundant information in the larval count samples. The DEM suggest a positive correlation for Culex (0.24) while for Anopheles there was a negative correlation (-0.23) for a local model distance to stream. Conclusion These data demonstrate that optical remote sensing; geostatistics and DEMs can be used to identify parameters associated with Culex and Anopheles aquatic habitats. PMID:18341699
Risk of hemorrhagic transformation after ischemic stroke in patients with antiphospholipid antibody syndrome.

PubMed

Mehta, Tapan; Hussain, Mohammed; Sheth, Khushboo; Ding, Yuchuan; McCullough, Louise D

2017-06-01

Several rheumatologic conditions including systemic lupus erythematosus, antiphospholipid antibody (APS) syndrome, rheumatoid arthritis, and scleroderma are known risk factors for stroke. The risk of hemorrhagic transformation after an acute ischemic stroke (AIS) in these patients is not known. We queried the Nationwide Inpatient Sample (NIS) data between 2010 and 2012 with ICD 9 diagnostic codes for AIS. The primary outcome was the development of hemorrhagic transformation. Multivariate predictors for hemorrhagic transformation were identified with a logistic regression model. Using SAS 9.2, Survey procedures were used to accommodate for hierarchical two stage cluster design of NIS. APS (OR 2.57, 95% CI 1.14-5.81, p = 0.0228) independently predicted risk of hemorrhagic transformation in multivariate regression analysis. Similarly, in multivariate regression models for the outcome variables of total charges of the hospitalization and length of stay (LOS), patients with APS had the highest charges ($56,286, p = 0.0228) and LOS (3.87 days, p = 0.0164) compared to other co-variates. Univariate analysis showed increased mortality in the APS compared to the non-APS group (11.68% vs. 7.16%, p = 0.0024). APS is an independent risk factor for hemorrhagic transformation in both thrombolytic and non-thrombolytic treated patients. APS is also associated with longer length and cost of hospital stay. Further research is warranted to identify the unique risk factors in these patients to identify strategies to reduce the risk of hemorrhagic transformation in this subgroup of the population.
Toward Automating HIV Identification: Machine Learning for Rapid Identification of HIV-Related Social Media Data.

PubMed

Young, Sean D; Yu, Wenchao; Wang, Wei

2017-02-01

"Social big data" from technologies such as social media, wearable devices, and online searches continue to grow and can be used as tools for HIV research. Although researchers can uncover patterns and insights associated with HIV trends and transmission, the review process is time consuming and resource intensive. Machine learning methods derived from computer science might be used to assist HIV domain experts by learning how to rapidly and accurately identify patterns associated with HIV from a large set of social data. Using an existing social media data set that was associated with HIV and coded by an HIV domain expert, we tested whether 4 commonly used machine learning methods could learn the patterns associated with HIV risk behavior. We used the 10-fold cross-validation method to examine the speed and accuracy of these models in applying that knowledge to detect HIV content in social media data. Logistic regression and random forest resulted in the highest accuracy in detecting HIV-related social data (85.3%), whereas the Ridge Regression Classifier resulted in the lowest accuracy. Logistic regression yielded the fastest processing time (16.98 seconds). Machine learning can enable social big data to become a new and important tool in HIV research, helping to create a new field of "digital HIV epidemiology." If a domain expert can identify patterns in social data associated with HIV risk or HIV transmission, machine learning models could quickly and accurately learn those associations and identify potential HIV patterns in large social data sets.
Prediction of Short-Distance Aerial Movement of Phakopsora pachyrhizi Urediniospores Using Machine Learning.

PubMed

Wen, L; Bowen, C R; Hartman, G L

2017-10-01

Dispersal of urediniospores by wind is the primary means of spread for Phakopsora pachyrhizi, the cause of soybean rust. Our research focused on the short-distance movement of urediniospores from within the soybean canopy and up to 61 m from field-grown rust-infected soybean plants. Environmental variables were used to develop and compare models including the least absolute shrinkage and selection operator regression, zero-inflated Poisson/regular Poisson regression, random forest, and neural network to describe deposition of urediniospores collected in passive and active traps. All four models identified distance of trap from source, humidity, temperature, wind direction, and wind speed as the five most important variables influencing short-distance movement of urediniospores. The random forest model provided the best predictions, explaining 76.1 and 86.8% of the total variation in the passive- and active-trap datasets, respectively. The prediction accuracy based on the correlation coefficient (r) between predicted values and the true values were 0.83 (P < 0.0001) and 0.94 (P < 0.0001) for the passive and active trap datasets, respectively. Overall, multiple machine learning techniques identified the most important variables to make the most accurate predictions of movement of P. pachyrhizi urediniospores short-distance.
Relationship of physiography and snow area to stream discharge. [Kings River Watershed, California

NASA Technical Reports Server (NTRS)

Mccuen, R. H. (Principal Investigator)

1979-01-01

The author has identified the following significant results. A comparison of snowmelt runoff models shows that the accuracy of the Tangborn model and regression models is greater if the test data falls within the range of calibration than if the test data lies outside the range of calibration data. The regression models are significantly more accurate for forecasts of 60 days or more than for shorter prediction periods. The Tangborn model is more accurate for forecasts of 90 days or more than for shorter prediction periods. The Martinec model is more accurate for forecasts of one or two days than for periods of 3,5,10, or 15 days. Accuracy of the long-term models seems to be independent of forecast data. The sufficiency of the calibration data base is a function not only of the number of years of record but also of the accuracy with which the calibration years represent the total population of data years. Twelve years appears to be a sufficient length of record for each of the models considered, as long as the twelve years are representative of the population.

Assessing NARCCAP climate model effects using spatial confidence regions.

PubMed

French, Joshua P; McGinnis, Seth; Schwartzman, Armin

2017-01-01

We assess similarities and differences between model effects for the North American Regional Climate Change Assessment Program (NARCCAP) climate models using varying classes of linear regression models. Specifically, we consider how the average temperature effect differs for the various global and regional climate model combinations, including assessment of possible interaction between the effects of global and regional climate models. We use both pointwise and simultaneous inference procedures to identify regions where global and regional climate model effects differ. We also show conclusively that results from pointwise inference are misleading, and that accounting for multiple comparisons is important for making proper inference.
Risk factors for pedicled flap necrosis in hand soft tissue reconstruction: a multivariate logistic regression analysis.

PubMed

Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun

2018-03-01

Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.
Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

PubMed

Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

2013-02-01

The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.
Developmental trajectories of paediatric headache - sex-specific analyses and predictors.

PubMed

Isensee, Corinna; Fernandez Castelao, Carolin; Kröner-Herwig, Birgit

2016-01-01

Headache is the most common pain disorder in children and adolescents and is associated with diverse dysfunctions and psychological symptoms. Several studies evidenced sex-specific differences in headache frequency. Until now no study exists that examined sex-specific patterns of change in paediatric headache across time and included pain-related somatic and (socio-)psychological predictors. Latent Class Growth Analysis (LCGA) was used in order to identify different trajectory classes of headache across four annual time points in a population-based sample (n = 3 227; mean age 11.34 years; 51.2 % girls). In multinomial logistic regression analyses the influence of several predictors on the class membership was examined. For girls, a four-class model was identified as the best fitting model. While the majority of girls reported no (30.5 %) or moderate headache frequencies (32.5 %) across time, one class with a high level of headache days (20.8 %) and a class with an increasing headache frequency across time (16.2 %) were identified. For boys a two class model with a 'no headache class' (48.6 %) and 'moderate headache class' (51.4 %) showed the best model fit. Regarding logistic regression analyses, migraine and parental headache proved to be stable predictors across sexes. Depression/anxiety was a significant predictor for all pain classes in girls. Life events, dysfunctional stress coping and school burden were also able to differentiate at least between some classes in both sexes. The identified trajectories reflect sex-specific differences in paediatric headache, as seen in the number and type of classes extracted. The documented risk factors can deliver ideas for preventive actions and considerations for treatment programmes.
Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh

NASA Astrophysics Data System (ADS)

Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.

2017-12-01

Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.
Chemokine receptors CXCR2 and CX3CR1 differentially regulate functional responses of bone-marrow endothelial progenitors during atherosclerotic plaque regression.

PubMed

Herlea-Pana, Oana; Yao, Longbiao; Heuser-Baker, Janet; Wang, Qiongxin; Wang, Qilong; Georgescu, Constantin; Zou, Ming-Hui; Barlic-Dicen, Jana

2015-05-01

Atherosclerosis manifests itself as arterial plaques, which lead to heart attacks or stroke. Treatments supporting plaque regression are therefore aggressively pursued. Studies conducted in models in which hypercholesterolaemia is reversible, such as the Reversa mouse model we have employed in the current studies, will be instrumental for the development of such interventions. Using this model, we have shown that advanced atherosclerosis regression occurs when lipid lowering is used in combination with bone-marrow endothelial progenitor cell (EPC) treatment. However, it remains unclear how EPCs home to regressing plaques and how they augment atherosclerosis reversal. Here we identify molecules that support functional responses of EPCs during plaque resolution. Chemokines CXCL1 and CX3CL1 were detected in the vascular wall of atheroregressing Reversa mice, and their cognate receptors CXCR2 and CX3CR1 were observed on adoptively transferred EPCs in circulation. We tested whether CXCL1-CXCR2 and CX3CL1-CX3CR1 axes regulate functional responses of EPCs during plaque reversal. We show that pharmacological inhibition of CXCR2 or CX3CR1, or genetic inactivation of these two chemokine receptors interfered with EPC-mediated advanced atherosclerosis regression. We also demonstrate that CXCR2 directs EPCs to regressing plaques while CX3CR1 controls a paracrine function(s) of these cells. CXCR2 and CX3CR1 differentially regulate EPC functional responses during atheroregression. Our study improves understanding of how chemokines and chemokine receptors regulate plaque resolution, which could determine the effectiveness of interventions reducing complications of atherosclerosis. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2015. For permissions please email: journals.permissions@oup.com.
Chemokine receptors CXCR2 and CX3CR1 differentially regulate functional responses of bone-marrow endothelial progenitors during atherosclerotic plaque regression

PubMed Central

Herlea-Pana, Oana; Yao, Longbiao; Heuser-Baker, Janet; Wang, Qiongxin; Wang, Qilong; Georgescu, Constantin; Zou, Ming-Hui; Barlic-Dicen, Jana

2015-01-01

Aims Atherosclerosis manifests itself as arterial plaques, which lead to heart attacks or stroke. Treatments supporting plaque regression are therefore aggressively pursued. Studies conducted in models in which hypercholesterolaemia is reversible, such as the Reversa mouse model we have employed in the current studies, will be instrumental for the development of such interventions. Using this model, we have shown that advanced atherosclerosis regression occurs when lipid lowering is used in combination with bone-marrow endothelial progenitor cell (EPC) treatment. However, it remains unclear how EPCs home to regressing plaques and how they augment atherosclerosis reversal. Here we identify molecules that support functional responses of EPCs during plaque resolution. Methods and results Chemokines CXCL1 and CX3CL1 were detected in the vascular wall of atheroregressing Reversa mice, and their cognate receptors CXCR2 and CX3CR1 were observed on adoptively transferred EPCs in circulation. We tested whether CXCL1–CXCR2 and CX3CL1–CX3CR1 axes regulate functional responses of EPCs during plaque reversal. We show that pharmacological inhibition of CXCR2 or CX3CR1, or genetic inactivation of these two chemokine receptors interfered with EPC-mediated advanced atherosclerosis regression. We also demonstrate that CXCR2 directs EPCs to regressing plaques while CX3CR1 controls a paracrine function(s) of these cells. Conclusion CXCR2 and CX3CR1 differentially regulate EPC functional responses during atheroregression. Our study improves understanding of how chemokines and chemokine receptors regulate plaque resolution, which could determine the effectiveness of interventions reducing complications of atherosclerosis. PMID:25765938
Regression analysis for LED color detection of visual-MIMO system

NASA Astrophysics Data System (ADS)

Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

2018-04-01

Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
Loss of MeCP2 in the rat models regression, impaired sociability and transcriptional deficits of Rett syndrome

PubMed Central

Veeraragavan, Surabi; Wan, Ying-Wooi; Connolly, Daniel R.; Hamilton, Shannon M.; Ward, Christopher S.; Soriano, Sirena; Pitcher, Meagan R.; McGraw, Christopher M.; Huang, Sharon G.; Green, Jennie R.; Yuva, Lisa A.; Liang, Agnes J.; Neul, Jeffrey L.; Yasui, Dag H.; LaSalle, Janine M.; Liu, Zhandong; Paylor, Richard; Samaco, Rodney C.

2016-01-01

Mouse models of the transcriptional modulator Methyl-CpG-Binding Protein 2 (MeCP2) have advanced our understanding of Rett syndrome (RTT). RTT is a ‘prototypical’ neurodevelopmental disorder with many clinical features overlapping with other intellectual and developmental disabilities (IDD). Therapeutic interventions for RTT may therefore have broader applications. However, the reliance on the laboratory mouse to identify viable therapies for the human condition may present challenges in translating findings from the bench to the clinic. In addition, the need to identify outcome measures in well-chosen animal models is critical for preclinical trials. Here, we report that a novel Mecp2 rat model displays high face validity for modelling psychomotor regression of a learned skill, a deficit that has not been shown in Mecp2 mice. Juvenile play, a behavioural feature that is uniquely present in rats and not mice, is also impaired in female Mecp2 rats. Finally, we demonstrate that evaluating the molecular consequences of the loss of MeCP2 in both mouse and rat may result in higher predictive validity with respect to transcriptional changes in the human RTT brain. These data underscore the similarities and differences caused by the loss of MeCP2 among divergent rodent species which may have important implications for the treatment of individuals with disease-causing MECP2 mutations. Taken together, these findings demonstrate that the Mecp2 rat model is a complementary tool with unique features for the study of RTT and highlight the potential benefit of cross-species analyses in identifying potential disease-relevant preclinical outcome measures. PMID:27365498
Predicting 6- and 12-Month Risk of Mortality in Patients With Platinum-Resistant Advanced-Stage Ovarian Cancer: Prognostic Model to Guide Palliative Care Referrals.

PubMed

Foote, Jonathan; Lopez-Acevedo, Micael; Samsa, Gregory; Lee, Paula S; Kamal, Arif H; Alvarez Secord, Angeles; Havrilesky, Laura J

2018-02-01

Predictive models are increasingly being used in clinical practice. The aim of the study was to develop a predictive model to identify patients with platinum-resistant ovarian cancer with a prognosis of less than 6 to 12 months who may benefit from immediate referral to hospice care. A retrospective chart review identified patients with platinum-resistant epithelial ovarian cancer who were treated at our institution between 2000 and 2011. A predictive model for survival was constructed based on the time from development of platinum resistance to death. Multivariate logistic regression modeling was used to identify significant survival predictors and to develop a predictive model. The following variables were included: time from diagnosis to platinum resistance, initial stage, debulking status, number of relapses, comorbidity score, albumin, hemoglobin, CA-125 levels, liver/lung metastasis, and the presence of a significant clinical event (SCE). An SCE was defined as a malignant bowel obstruction, pleural effusion, or ascites occurring on or before the diagnosis of platinum resistance. One hundred sixty-four patients met inclusion criteria. In the regression analysis, only an SCE and the presence of liver or lung metastasis were associated with poorer short-term survival (P < 0.001). Nine percent of patients with an SCE or liver or lung metastasis survived 6 months or greater and 0% survived 12 months or greater, compared with 85% and 67% of patients without an SCE or liver or lung metastasis, respectively. Patients with platinum-resistant ovarian cancer who have experienced an SCE or liver or lung metastasis have a high risk of death within 6 months and should be considered for immediate referral to hospice care.
Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement.

PubMed

Teoh, Shao Thing; Kitamura, Miki; Nakayama, Yasumune; Putri, Sastia; Mukai, Yukio; Fukusaki, Eiichiro

2016-08-01

In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Using a Guided Machine Learning Ensemble Model to Predict Discharge Disposition following Meningioma Resection.

PubMed

Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B

2018-04-01

Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
The moderation of resilience on the negative effect of pain on depression and post-traumatic growth in individuals with spinal cord injury.

PubMed

Min, Jung-Ah; Lee, Chang-Uk; Hwang, Sung-Il; Shin, Jung-In; Lee, Bum-Suk; Han, Sang-Hoon; Ju, Hye-In; Lee, Cha-Yeon; Lee, Chul; Chae, Jeong-Ho

2014-01-01

To determine the moderating effect of resilience on the negative effects of chronic pain on depression and post-traumatic growth. Community-dwelling individuals with SCI (n = 37) were recruited at short-term admission for yearly regular health examination. Participants completed self-rating standardized questionnaires measuring pain, resilience, depression and post-traumatic growth. Hierarchical linear regression analysis was performed to identify the moderating effect of resilience on the relationships of pain with depression and post-traumatic growth after controlling for relevant covariates. In the regression model of depression, the effect of pain severity on depression was decreased (β was changed from 0.47 to 0.33) after entering resilience into the model. In the final model, both pain and resilience were significant independent predictors for depression (β = 0.33, p = 0.038 and β = -0.47, p = 0.012, respectively). In the regression model of post-traumatic growth, the effect of pain severity became insignificant after entering resilience into the model. In the final model, resilience was a significant predictor (β = 0.51, p = 0.016). Resilience potentially mitigated the negative effects of pain. Moreover, it independently contributed to reduced depression and greater post-traumatic growth. Our findings suggest that resilience might provide a potential target for intervention in SCI individuals.
UCODE, a computer code for universal inverse modeling

USGS Publications Warehouse

Poeter, E.P.; Hill, M.C.

1999-01-01

This article presents the US Geological Survey computer program UCODE, which was developed in collaboration with the US Army Corps of Engineers Waterways Experiment Station and the International Ground Water Modeling Center of the Colorado School of Mines. UCODE performs inverse modeling, posed as a parameter-estimation problem, using nonlinear regression. Any application model or set of models can be used; the only requirement is that they have numerical (ASCII or text only) input and output files and that the numbers in these files have sufficient significant digits. Application models can include preprocessors and postprocessors as well as models related to the processes of interest (physical, chemical and so on), making UCODE extremely powerful for model calibration. Estimated parameters can be defined flexibly with user-specified functions. Observations to be matched in the regression can be any quantity for which a simulated equivalent value can be produced, thus simulated equivalent values are calculated using values that appear in the application model output files and can be manipulated with additive and multiplicative functions, if necessary. Prior, or direct, information on estimated parameters also can be included in the regression. The nonlinear regression problem is solved by minimizing a weighted least-squares objective function with respect to the parameter values using a modified Gauss-Newton method. Sensitivities needed for the method are calculated approximately by forward or central differences and problems and solutions related to this approximation are discussed. Statistics are calculated and printed for use in (1) diagnosing inadequate data or identifying parameters that probably cannot be estimated with the available data, (2) evaluating estimated parameter values, (3) evaluating the model representation of the actual processes and (4) quantifying the uncertainty of model simulated values. UCODE is intended for use on any computer operating system: it consists of algorithms programmed in perl, a freeware language designed for text manipulation and Fortran90, which efficiently performs numerical calculations.
Risk factor assessment to anticipate performance in the National Developmental Screening Test in children from a disadvantaged area.

PubMed

Montes, Alejandro; Pazos, Gustavo

2016-02-01

Identifying children at risk of failing the National Developmental Screening Test by combining prevalences of children suspected of having inapparent developmental disorders (IDDs) and associated risk factors (RFs) would allow to save resources. 1. To estimate the prevalence of children suspected of having IDDs. 2. To identify associated RFs. 3. To assess three methods developed based on observed RFs and propose a pre-screening procedure. The National Developmental Screening Test was administered to 60 randomly selected children aged between 2 and 4 years old from a socioeconomically disadvantaged area from Puerto Madryn. Twenty-four biological and socioenvironmental outcome measures were assessed in order to identify potential RFs using bivariate and multivariate analyses. The likelihood of failing the screening test was estimated as follows: 1. a multivariate logistic regression model was developed; 2. a relationship was established between the number of RFs present in each child and the percentage of children who failed the test; 3. these two methods were combined. The prevalence of children suspected of having IDDs was 55.0% (95% confidence interval: 42.4%-67.6%). Six RFs were initially identified using the bivariate approach. Three of them (maternal education, number of health checkups and Z scores for height-for-age, and maternal age) were included in the logistic regression model, which has a greater explanatory power. The third method included in the assessment showed greater sensitivity and specificity (85% and 79%, respectively). The estimated prevalence of children suspected of having IDDs was four times higher than the national standards. Seven RFs were identified. Combining the analysis of risk factor accumulation and a multivariate model provides a firm basis for developing a sensitive, specific and practical pre-screening procedure for socioeconomically disadvantaged areas. Sociedad Argentina de Pediatría.
Predicting biological condition in southern California streams

USGS Publications Warehouse

Brown, Larry R.; May, Jason T.; Rehn, Andrew C.; Ode, Peter R.; Waite, Ian R.; Kennen, Jonathan G.

2012-01-01

As understanding of the complex relations among environmental stressors and biological responses improves, a logical next step is predictive modeling of biological condition at unsampled sites. We developed a boosted regression tree (BRT) model of biological condition, as measured by a benthic macroinvertebrate index of biotic integrity (BIBI), for streams in urbanized Southern Coastal California. We also developed a multiple linear regression (MLR) model as a benchmark for comparison with the BRT model. The BRT model explained 66% of the variance in B-IBI, identifying watershed population density and combined percentage agricultural and urban land cover in the riparian buffer as the most important predictors of B-IBI, but with watershed mean precipitation and watershed density of manmade channels also important. The MLR model explained 48% of the variance in B-IBI and included watershed population density and combined percentage agricultural and urban land cover in the riparian buffer. For a verification data set, the BRT model correctly classified 75% of impaired sites (B-IBI < 40) and 78% of unimpaired sites (B-IBI = 40). For the same verification data set, the MLR model correctly classified 69% of impaired sites and 87% of unimpaired sites. The BRT model should not be used to predict B-IBI for specific sites; however, the model can be useful for general applications such as identifying and prioritizing regions for monitoring, remediation or preservation, stratifying new bioassessments according to anticipated biological condition, or assessing the potential for change in stream biological condition based on anticipated changes in population density and development in stream buffers.
A multilateral modelling of Youth Soccer Performance Index (YSPI)

NASA Astrophysics Data System (ADS)

Bisyri Husin Musawi Maliki, Ahmad; Razali Abdullah, Mohamad; Juahir, Hafizan; Abdullah, Farhana; Ain Shahirah Abdullah, Nurul; Muazu Musa, Rabiu; Musliha Mat-Rasid, Siti; Adnan, Aleesha; Azura Kosni, Norlaila; Muhamad, Wan Siti Amalina Wan; Afiqah Mohamad Nasir, Nur

2018-04-01

This study aims to identify the most dominant factors that influencing performance of soccer player and to predict group performance for soccer players. A total of 184 of youth soccer players from Malaysia sport school and six soccer academy encompasses as respondence of the study. Exploratory factor analysis (EFA) and Confirmatory factor analysis (CFA) were computed to identify the most dominant factors whereas reducing the initial 26 parameters with recommended >0.5 of factor loading. Meanwhile, prediction of the soccer performance was predicted by regression model. CFA revealed that sit and reach, vertical jump, VO2max, age, weight, height, sitting height, calf circumference (cc), medial upper arm circumference (muac), maturation, bicep, triceps, subscapular, suprailiac, 5M, 10M, and 20M speed were the most dominant factors. Further index analysis forming Youth Soccer Performance Index (YSPI) resulting by categorizing three groups namely, high, moderate, and low. The regression model for this study was significant set as p < 0.001 and R2 is 0.8222 which explained that the model contributed a total of 82% prediction ability to predict the whole set of the variables. The significant parameters in contributing prediction of YSPI are discussed. As a conclusion, the precision of the prediction models by integrating a multilateral factor reflecting for predicting potential soccer player and hopefully can create a competitive soccer games.
Correlates of Incident Cognitive Impairment in the REasons for Geographic and Racial Differences in Stroke (REGARDS) Study

PubMed Central

Gillett, Sarah R.; Thacker, Evan L.; Letter, Abraham J.; McClure, Leslie A.; Wadley, Virginia G.; Unverzagt, Frederick W.; Kissela, Brett M.; Kennedy, Richard E.; Glasser, Stephen P.; Levine, Deborah A.; Cushman, Mary

2015-01-01

Objective To identify approximately 500 cases of incident cognitive impairment (ICI) in a large, national sample adapting an existing cognitive test-based case definition and to examine relationships of vascular risk factors with ICI. Method Participants were from the REGARDS study, a national sample of 30,239 African-American and white Americans. Participants included in this analysis had normal cognitive screening and no history of stroke at baseline, and at least one follow-up cognitive assessment with a three test battery (TTB). Regression-based norms were applied to TTB scores to identify cases of ICI. Logistic regression was used to model associations with baseline vascular risk factors. Results We identified 495 participants with ICI out of 17,630 eligible participants. In multivariable modeling, income (OR 1.83 CI 1.27,2.62), stroke belt residence (OR 1.45 CI 1.18,1.78), history of transient ischemic attack (OR 1.90 CI 1.29,2.81), coronary artery disease(OR 1.32 CI 1.02,1.70), diabetes (OR 1.48 CI 1.17,1.87), obesity (OR 1.40 CI 1.05,1.86), and incident stroke (OR 2.73 CI 1.52,4.90) were associated with ICI. Conclusions We adapted a previously validated cognitive test-based case definition to identify cases of ICI. Many previously identified risk factors were associated with ICI, supporting the criterion-related validity of our definition. PMID:25978342
Tularosa Basin Play Fairway: Weights of Evidence Models

DOE Data Explorer

Adam Brandt

2015-12-01

These models are related to weights of evidence play fairway anlaysis of the Tularosa Basin, New Mexico and Texas. They were created through Spatial Data Modeler: ArcMAP 9.3 geoprocessing tools for spatial data modeling using weights of evidence, logistic regression, fuzzy logic and neural networks. It used to identify high values for potential geothermal plays and low values. The results are relative not only within the Tularosa Basin, but also throughout New Mexico, Utah, Nevada, and other places where high to moderate enthalpy geothermal systems are present (training sites).
Bayesian structured additive regression modeling of epidemic data: application to cholera

PubMed Central

2012-01-01

Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662

Dirichlet Component Regression and its Applications to Psychiatric Data.

PubMed

Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel

2008-08-15

We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook's distance, and a local jackknife influence metric.
Spatiotemporal Bayesian analysis of Lyme disease in New York state, 1990-2000.

PubMed

Chen, Haiyan; Stratton, Howard H; Caraco, Thomas B; White, Dennis J

2006-07-01

Mapping ordinarily increases our understanding of nontrivial spatial and temporal heterogeneities in disease rates. However, the large number of parameters required by the corresponding statistical models often complicates detailed analysis. This study investigates the feasibility of a fully Bayesian hierarchical regression approach to the problem and identifies how it outperforms two more popular methods: crude rate estimates (CRE) and empirical Bayes standardization (EBS). In particular, we apply a fully Bayesian approach to the spatiotemporal analysis of Lyme disease incidence in New York state for the period 1990-2000. These results are compared with those obtained by CRE and EBS in Chen et al. (2005). We show that the fully Bayesian regression model not only gives more reliable estimates of disease rates than the other two approaches but also allows for tractable models that can accommodate more numerous sources of variation and unknown parameters.
Prediction of performance on the RCMP physical ability requirement evaluation.

PubMed

Stanish, H I; Wood, T M; Campagna, P

1999-08-01

The Royal Canadian Mounted Police use the Physical Ability Requirement Evaluation (PARE) for screening applicants. The purposes of this investigation were to identify those field tests of physical fitness that were associated with PARE performance and determine which most accurately classified successful and unsuccessful PARE performers. The participants were 27 female and 21 male volunteers. Testing included measures of aerobic power, anaerobic power, agility, muscular strength, muscular endurance, and body composition. Multiple regression analysis revealed a three-variable model for males (70-lb bench press, standing long jump, and agility) explaining 79% of the variability in PARE time, whereas a one-variable model (agility) explained 43% of the variability for females. Analysis of the classification accuracy of the males' data was prohibited because 91% of the males passed the PARE. Classification accuracy of the females' data, using logistic regression, produced a two-variable model (agility, 1.5-mile endurance run) with 93% overall classification accuracy.
Growth and inactivation of Salmonella at low refrigerated storage temperatures and thermal inactivation on raw chicken meat and laboratory media: mixed effect meta-analysis.

PubMed

Smadi, Hanan; Sargeant, Jan M; Shannon, Harry S; Raina, Parminder

2012-12-01

Growth and inactivation regression equations were developed to describe the effects of temperature on Salmonella concentration on chicken meat for refrigerated temperatures (⩽10°C) and for thermal treatment temperatures (55-70°C). The main objectives were: (i) to compare Salmonella growth/inactivation in chicken meat versus laboratory media; (ii) to create regression equations to estimate Salmonella growth in chicken meat that can be used in quantitative risk assessment (QRA) modeling; and (iii) to create regression equations to estimate D-values needed to inactivate Salmonella in chicken meat. A systematic approach was used to identify the articles, critically appraise them, and pool outcomes across studies. Growth represented in density (Log10CFU/g) and D-values (min) as a function of temperature were modeled using hierarchical mixed effects regression models. The current meta-analysis analysis found a significant difference (P⩽0.05) between the two matrices - chicken meat and laboratory media - for both growth at refrigerated temperatures and inactivation by thermal treatment. Growth and inactivation were significantly influenced by temperature after controlling for other variables; however, no consistent pattern in growth was found. Validation of growth and inactivation equations against data not used in their development is needed. Copyright © 2012 Ministry of Health, Saudi Arabia. Published by Elsevier Ltd. All rights reserved.
Identifying Nanoscale Structure-Function Relationships Using Multimodal Atomic Force Microscopy, Dimensionality Reduction, and Regression Techniques.

PubMed

Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S

2018-05-31

Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.
Parametric regression model for survival data: Weibull regression model as an example

PubMed Central

2016-01-01

Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846
Network structure and travel time perception.

PubMed

Parthasarathi, Pavithra; Levinson, David; Hochmair, Hartwig

2013-01-01

The purpose of this research is to test the systematic variation in the perception of travel time among travelers and relate the variation to the underlying street network structure. Travel survey data from the Twin Cities metropolitan area (which includes the cities of Minneapolis and St. Paul) is used for the analysis. Travelers are classified into two groups based on the ratio of perceived and estimated commute travel time. The measures of network structure are estimated using the street network along the identified commute route. T-test comparisons are conducted to identify statistically significant differences in estimated network measures between the two traveler groups. The combined effect of these estimated network measures on travel time is then analyzed using regression models. The results from the t-test and regression analyses confirm the influence of the underlying network structure on the perception of travel time.
Machine learning approaches to the social determinants of health in the health and retirement study.

PubMed

Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David

2018-04-01

Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.
Development of a statistical model for the determination of the probability of riverbank erosion in a Meditteranean river basin

NASA Astrophysics Data System (ADS)

Varouchakis, Emmanouil; Kourgialas, Nektarios; Karatzas, George; Giannakis, Georgios; Lilli, Maria; Nikolaidis, Nikolaos

2014-05-01

Riverbank erosion affects the river morphology and the local habitat and results in riparian land loss, damage to property and infrastructures, ultimately weakening flood defences. An important issue concerning riverbank erosion is the identification of the areas vulnerable to erosion, as it allows for predicting changes and assists with stream management and restoration. One way to predict the vulnerable to erosion areas is to determine the erosion probability by identifying the underlying relations between riverbank erosion and the geomorphological and/or hydrological variables that prevent or stimulate erosion. A statistical model for evaluating the probability of erosion based on a series of independent local variables and by using logistic regression is developed in this work. The main variables affecting erosion are vegetation index (stability), the presence or absence of meanders, bank material (classification), stream power, bank height, river bank slope, riverbed slope, cross section width and water velocities (Luppi et al. 2009). In statistics, logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable, e.g. binary response, based on one or more predictor variables (continuous or categorical). The probabilities of the possible outcomes are modelled as a function of independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. 1 = "presence of erosion" and 0 = "no erosion") for any value of the independent variables. The regression coefficients are estimated by using maximum likelihood estimation. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested (Atkinson et al. 2003). The developed statistical model is applied to the Koiliaris River Basin in the island of Crete, Greece. The aim is to determine the probability of erosion along the Koiliaris' riverbanks considering a series of independent geomorphological and/or hydrological variables. Data for the river bank slope and for the river cross section width are available at ten locations along the river. The riverbank has indications of erosion at six of the ten locations while four has remained stable. Based on a recent work, measurements for the two independent variables and data regarding bank stability are available at eight different locations along the river. These locations were used as validation points for the proposed statistical model. The results show a very close agreement between the observed erosion indications and the statistical model as the probability of erosion was accurately predicted at seven out of the eight locations. The next step is to apply the model at more locations along the riverbanks. In November 2013, stakes were inserted at selected locations in order to be able to identify the presence or absence of erosion after the winter period. In April 2014 the presence or absence of erosion will be identified and the model results will be compared to the field data. Our intent is to extend the model by increasing the number of independent variables in order to indentify the key factors favouring erosion along the Koiliaris River. We aim at developing an easy to use statistical tool that will provide a quantified measure of the erosion probability along the riverbanks, which could consequently be used to prevent erosion and flooding events. Atkinson, P. M., German, S. E., Sear, D. A. and Clark, M. J. 2003. Exploring the relations between riverbank erosion and geomorphological controls using geographically weighted logistic regression. Geographical Analysis, 35 (1), 58-82. Luppi, L., Rinaldi, M., Teruggi, L. B., Darby, S. E. and Nardi, L. 2009. Monitoring and numerical modelling of riverbank erosion processes: A case study along the Cecina River (central Italy). Earth Surface Processes and Landforms, 34 (4), 530-546. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Identification of the prediction model for dengue incidence in Can Tho city, a Mekong Delta area in Vietnam.

PubMed

Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Manh, Cuong Do

2015-01-01

The Mekong Delta is highly vulnerable to climate change and a dengue endemic area in Vietnam. This study aims to examine the association between climate factors and dengue incidence and to identify the best climate prediction model for dengue incidence in Can Tho city, the Mekong Delta area in Vietnam. We used three different regression models comprising: standard multiple regression model (SMR), seasonal autoregressive integrated moving average model (SARIMA), and Poisson distributed lag model (PDLM) to examine the association between climate factors and dengue incidence over the period 2003-2010. We validated the models by forecasting dengue cases for the period of January-December, 2011 using the mean absolute percentage error (MAPE). Receiver operating characteristics curves were used to analyze the sensitivity of the forecast of a dengue outbreak. The results indicate that temperature and relative humidity are significantly associated with changes in dengue incidence consistently across the model methods used, but not cumulative rainfall. The Poisson distributed lag model (PDLM) performs the best prediction of dengue incidence for a 6, 9, and 12-month period and diagnosis of an outbreak however the SARIMA model performs a better prediction of dengue incidence for a 3-month period. The simple or standard multiple regression performed highly imprecise prediction of dengue incidence. We recommend a follow-up study to validate the model on a larger scale in the Mekong Delta region and to analyze the possibility of incorporating a climate-based dengue early warning method into the national dengue surveillance system. Copyright © 2014 Elsevier B.V. All rights reserved.
Introduction to the use of regression models in epidemiology.

PubMed

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Multiple balance tests improve the assessment of postural stability in subjects with Parkinson's disease

PubMed Central

Jacobs, J V; Horak, F B; Tran, V K; Nutt, J G

2006-01-01

Objectives Clinicians often base the implementation of therapies on the presence of postural instability in subjects with Parkinson's disease (PD). These decisions are frequently based on the pull test from the Unified Parkinson's Disease Rating Scale (UPDRS). We sought to determine whether combining the pull test, the one‐leg stance test, the functional reach test, and UPDRS items 27–29 (arise from chair, posture, and gait) predicts balance confidence and falling better than any test alone. Methods The study included 67 subjects with PD. Subjects performed the one‐leg stance test, the functional reach test, and the UPDRS motor exam. Subjects also responded to the Activities‐specific Balance Confidence (ABC) scale and reported how many times they fell during the previous year. Regression models determined the combination of tests that optimally predicted mean ABC scores or categorised fall frequency. Results When all tests were included in a stepwise linear regression, only gait (UPDRS item 29), the pull test (UPDRS item 30), and the one‐leg stance test, in combination, represented significant predictor variables for mean ABC scores (r2 = 0.51). A multinomial logistic regression model including the one‐leg stance test and gait represented the model with the fewest significant predictor variables that correctly identified the most subjects as fallers or non‐fallers (85% of subjects were correctly identified). Conclusions Multiple balance tests (including the one‐leg stance test, and the gait and pull test items of the UPDRS) that assess different types of postural stress provide an optimal assessment of postural stability in subjects with PD. PMID:16484639
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Automatic Classification of Users' Health Information Need Context: Logistic Regression Analysis of Mouse-Click and Eye-Tracker Data.

PubMed

Pian, Wenjing; Khoo, Christopher Sg; Chi, Jianxing

2017-12-21

Users searching for health information on the Internet may be searching for their own health issue, searching for someone else's health issue, or browsing with no particular health issue in mind. Previous research has found that these three categories of users focus on different types of health information. However, most health information websites provide static content for all users. If the three types of user health information need contexts can be identified by the Web application, the search results or information offered to the user can be customized to increase its relevance or usefulness to the user. The aim of this study was to investigate the possibility of identifying the three user health information contexts (searching for self, searching for others, or browsing with no particular health issue in mind) using just hyperlink clicking behavior; using eye-tracking information; and using a combination of eye-tracking, demographic, and urgency information. Predictive models are developed using multinomial logistic regression. A total of 74 participants (39 females and 35 males) who were mainly staff and students of a university were asked to browse a health discussion forum, Healthboards.com. An eye tracker recorded their examining (eye fixation) and skimming (quick eye movement) behaviors on 2 types of screens: summary result screen displaying a list of post headers, and detailed post screen. The following three types of predictive models were developed using logistic regression analysis: model 1 used only the time spent in scanning the summary result screen and reading the detailed post screen, which can be determined from the user's mouse clicks; model 2 used the examining and skimming durations on each screen, recorded by an eye tracker; and model 3 added user demographic and urgency information to model 2. An analysis of variance (ANOVA) analysis found that users' browsing durations were significantly different for the three health information contexts (P<.001). The logistic regression model 3 was able to predict the user's type of health information context with a 10-fold cross validation mean accuracy of 84% (62/74), followed by model 2 at 73% (54/74) and model 1 at 71% (52/78). In addition, correlation analysis found that particular browsing durations were highly correlated with users' age, education level, and the urgency of their information need. A user's type of health information need context (ie, searching for self, for others, or with no health issue in mind) can be identified with reasonable accuracy using just user mouse clicks that can easily be detected by Web applications. Higher accuracy can be obtained using Google glass or future computing devices with eye tracking function. ©Wenjing Pian, Christopher SG Khoo, Jianxing Chi. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 21.12.2017.
Fearful and Distracted in School: Predicting Bullying among Youths

ERIC Educational Resources Information Center

Brewer, Steven Lawrence, Jr.; Meckley-Brewer, Hannah; Stinson, Philip M.

2017-01-01

Bullying and aggression in schools can have a traumatic and lasting effect on the well-being of children and youths. Using data from the 2013 National Crime Victimization Survey's School Crime Supplement, this study uses a chi-square automatic interaction detection (CHAID) decision tree and logistic regression models to identify factors that…
Predicting and Managing Turnover in Human Service Agencies: A Case Study of an Organization in Crisis.

ERIC Educational Resources Information Center

Balfour, Danny L.; Neff, Donna M.

1993-01-01

A logistic regression model applied to data from 171 child service caseworkers identified variables determining job turnover during times of intense external criticism of the agency (length of service, professional commitment, level of education). A special training program did not significantly reduce the probability of turnover. (SK)
Impact of Preadmission Variables on USMLE Step 1 and Step 2 Performance

ERIC Educational Resources Information Center

Kleshinski, James; Khuder, Sadik A.; Shapiro, Joseph I.; Gold, Jeffrey P.

2009-01-01

Purpose: To examine the predictive ability of preadmission variables on United States Medical Licensing Examinations (USMLE) step 1 and step 2 performance, incorporating the use of a neural network model. Method: Preadmission data were collected on matriculants from 1998 to 2004. Linear regression analysis was first used to identify predictors of…
The Role of Family, Religiosity, and Behavior in Adolescent Gambling

ERIC Educational Resources Information Center

Casey, David M.; Williams, Robert J.; Mossiere, Annik M.; Schopflocher, Donald P.; el-Guebaly, Nady; Hodgins, David C.; Smith, Garry J.; Wood, Robert T.

2011-01-01

Predictors of adolescent gambling behavior were examined in a sample of 436 males and females (ages 13-16). A biopsychosocial model was used to identify key variables that differentiate between non-gambling and gambling adolescents. Logistic regression found that, as compared to adolescent male non-gamblers, adolescent male gamblers were older,…
Stages of Change for Fruit and Vegetable Consumption in Deprived Neighborhoods

ERIC Educational Resources Information Center

Kloek, Gitte C.; van Lenthe, Frank J.; van Nierop, Peter W. M.; Mackenbach, Johan P.

2004-01-01

This article describes the association of external and psychosocial factors on the stages of change for fruit and vegetable consumption, among 2,781 inhabitants, aged 18 to 65 years, in deprived neighborhoods (response rate 60%). To identify correlates of forward stage transition, an ordinal logistic regression model, the Threshold of Change Model…
The Chinese High School Student's Stress in the School and Academic Achievement

ERIC Educational Resources Information Center

Liu, Yangyang; Lu, Zuhong

2011-01-01

In a sample of 466 Chinese high school students, we examined the relationships between Chinese high school students' stress in the school and their academic achievements. Regression mixture modelling identified two different classes of the effects of Chinese high school students' stress on their academic achievements. One class contained 87% of…

Predicting Performance on a Firefighter's Ability Test from Fitness Parameters

ERIC Educational Resources Information Center

Michaelides, Marcos A.; Parpa, Koulla M.; Thompson, Jerald; Brown, Barry

2008-01-01

The purpose of this project was to identify the relationships between various fitness parameters such as upper body muscular endurance, upper and lower body strength, flexibility, body composition and performance on an ability test (AT) that included simulated firefighting tasks. A second intent was to create a regression model that would predict…
EMI-Sensor Data to Identify Areas of Manure Accumulation on a Feedlot Surface

USDA-ARS?s Scientific Manuscript database

A study was initiated to test the validity of using electromagnetic induction (EMI) survey data, a prediction-based sampling strategy and ordinary linear regression modeling to predict spatially variable feedlot surface manure accumulation. A 30 m × 60 m feedlot pen with a central mound was selecte...
What Is the Relationship between Teacher Quality and Student Achievement? An Exploratory Study

ERIC Educational Resources Information Center

Stronge, James H.; Ward, Thomas J.; Tucker, Pamela D.; Hindman, Jennifer L.

2007-01-01

The major purpose of the study was to examine what constitutes effective teaching as defined by measured increases in student learning with a focus on the instructional behaviors and practices. Ordinary least squares (OLS) regression analyses and hierarchical linear modeling (HLM) were used to identify teacher effectiveness levels while…
Differences in Health Determinants between International and Domestic Students at a German.

ERIC Educational Resources Information Center

Kramer, Alexander; Prufer-Kramer, Luise; Stock, Christiane; Tshiananga, Jacques Tshiang

2004-01-01

The authors used a standardized questionnaire to survey 201 international and 193 German students at the University of Bielefeld, Germany, to determine differences in health practices between the 2 groups and to identify targets for health-promoting interventions. Multivariate logistic regression models revealed that long-term female international…
Predictors of Service Utilization among Youth Diagnosed with Mood Disorders

ERIC Educational Resources Information Center

Mendenhall, Amy N.

2012-01-01

In this study, I investigated patterns and predictors of service utilization for children with mood disorders. The Behavioral Model for Health Care Utilization was used as an organizing framework for identifying predictors of the number and quality of services utilized. Hierarchical regression was used in secondary data analyses of the…
Current suicidal ideation in treatment-seeking individuals in the United Kingdom with gambling problems.

PubMed

Ronzitti, Silvia; Soldini, Emiliano; Smith, Neil; Potenza, Marc N; Clerici, Massimo; Bowden-Jones, Henrietta

2017-11-01

Studies show higher lifetime prevalence of suicidality in individuals with pathological gambling. However, less is known about the relationship between pathological gambling and current suicidal ideation. We investigated socio-demographic, clinical and gambling-related variables associated with suicidality in treatment-seeking individuals. Bivariate analyses and logistic regression models were generated on data from 903 individuals to identify measures associated with aspects of suicidality. Forty-six percent of patients reported current suicidal ideation. People with current suicidal thoughts were more likely to report greater problem-gambling severity (p<0.001), depression (p<0.001) and anxiety (p<0.001) compared to those without suicidality. Logistic regression models suggested that past suicidal ideation (p<0.001) and higher anxiety (p<0.05) may be predictive factors of current suicidality. Our findings suggest that the severity of anxiety disorder, along with a lifetime history of suicidal ideation, may help to identify treatment-seeking individuals with pathological gambling with a higher risk of suicidality, highlighting the importance of assessing suicidal ideation in clinical settings. Copyright © 2017 Elsevier Ltd. All rights reserved.
The perception of the relationship between environment and health according to data from Italian Behavioural Risk Factor Surveillance System (PASSI).

PubMed

Sampaolo, Letizia; Tommaso, Giulia; Gherardi, Bianca; Carrozzi, Giuliano; Freni Sterrantino, Anna; Ottone, Marta; Goldoni, Carlo Alberto; Bertozzi, Nicoletta; Scaringi, Meri; Bolognesi, Lara; Masocco, Maria; Salmaso, Stefania; Lauriola, Paolo

2017-01-01

"OBJECTIVES: to identify groups of people in relation to the perception of environmental risk and to assess the main characteristics using data collected in the environmental module of the surveillance network Italian Behavioral Risk Factor Surveillance System (PASSI). perceptive profiles were identified using a latent class analysis; later they were included as outcome in multinomial logistic regression models to assess the association between environmental risk perception and demographic, health, socio-economic and behavioural variables. the latent class analysis allowed to split the sample in "worried", "indifferent", and "positive" people. The multinomial logistic regression model showed that the "worried" profile typically includes people of Italian nationality, living in highly urbanized areas, with a high level of education, and with economic difficulties; they pay special attention to their own health and fitness, but they have a negative perception of their own psychophysical state. the application of advanced statistical analysis enable to appraise PASSI data in order to characterize the perception of environmental risk, making the planning of interventions related to risk communication possible. ".
Addictive internet use among Korean adolescents: a national survey.

PubMed

Heo, Jongho; Oh, Juhwan; Subramanian, S V; Kim, Yoon; Kawachi, Ichiro

2014-01-01

A psychological disorder called 'Internet addiction' has newly emerged along with a dramatic increase of worldwide Internet use. However, few studies have used population-level samples nor taken into account contextual factors on Internet addiction. We identified 57,857 middle and high school students (13-18 year olds) from a Korean nationally representative survey, which was surveyed in 2009. To identify associated factors with addictive Internet use, two-level multilevel regression models were fitted with individual-level responses (1st level) nested within schools (2nd level) to estimate associations of individual and school characteristics simultaneously. Gender differences of addictive Internet use were estimated with the regression model stratified by gender. Significant associations were found between addictive Internet use and school grade, parental education, alcohol use, tobacco use, and substance use. Female students in girls' schools were more likely to use Internet addictively than those in coeducational schools. Our results also revealed significant gender differences of addictive Internet use in its associated individual- and school-level factors. Our results suggest that multilevel risk factors along with gender differences should be considered to protect adolescents from addictive Internet use.
Optimizing complex phenotypes through model-guided multiplex genome engineering

DOE PAGES

Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...

2017-05-25

Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Optimizing complex phenotypes through model-guided multiplex genome engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.

Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Using decision tree analysis to identify risk factors for relapse to smoking

PubMed Central

Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.

2010-01-01

This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871
Surrogate Model Application to the Identification of Optimal Groundwater Exploitation Scheme Based on Regression Kriging Method—A Case Study of Western Jilin Province

PubMed Central

An, Yongkai; Lu, Wenxi; Cheng, Weiguo

2015-01-01

This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately. PMID:26264008
[Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

PubMed

Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
Iterative integral parameter identification of a respiratory mechanics model.

PubMed

Schranz, Christoph; Docherty, Paul D; Chiew, Yeong Shiong; Möller, Knut; Chase, J Geoffrey

2012-07-18

Patient-specific respiratory mechanics models can support the evaluation of optimal lung protective ventilator settings during ventilation therapy. Clinical application requires that the individual's model parameter values must be identified with information available at the bedside. Multiple linear regression or gradient-based parameter identification methods are highly sensitive to noise and initial parameter estimates. Thus, they are difficult to apply at the bedside to support therapeutic decisions. An iterative integral parameter identification method is applied to a second order respiratory mechanics model. The method is compared to the commonly used regression methods and error-mapping approaches using simulated and clinical data. The clinical potential of the method was evaluated on data from 13 Acute Respiratory Distress Syndrome (ARDS) patients. The iterative integral method converged to error minima 350 times faster than the Simplex Search Method using simulation data sets and 50 times faster using clinical data sets. Established regression methods reported erroneous results due to sensitivity to noise. In contrast, the iterative integral method was effective independent of initial parameter estimations, and converged successfully in each case tested. These investigations reveal that the iterative integral method is beneficial with respect to computing time, operator independence and robustness, and thus applicable at the bedside for this clinical application.
Breast arterial calcification is associated with reproductive factors in asymptomatic postmenopausal women.

PubMed

Bielak, Lawrence F; Whaley, Dana H; Sheedy, Patrick F; Peyser, Patricia A

2010-09-01

The etiology of breast arterial calcification (BAC) is not well understood. We examined reproductive history and cardiovascular disease (CVD) risk factor associations with the presence of detectable BAC in asymptomatic postmenopausal women. Reproductive history and CVD risk factors were obtained in 240 asymptomatic postmenopausal women from a community-based research study who had a screening mammogram within 2 years of their participation in the study. The mammograms were reviewed for the presence of detectable BAC. Age-adjusted logistic regression models were fit to assess the association between each risk factor and the presence of BAC. Multiple variable logistic regression models were used to identify the most parsimonious model for the presence of BAC. The prevalence of BAC increased with increased age (p < 0.0001). The most parsimonious logistic regression model for BAC presence included age at time of examination, increased parity (p = 0.01), earlier age at first birth (p = 0.002), weight, and an age-by-weight interaction term (p = 0.004). Older women with a smaller body size had a higher probability of having BAC than women of the same age with a larger body size. The presence or absence of BAC at mammography may provide an assessment of a postmenopausal woman's lifetime estrogen exposure and indicate women who could be at risk for hormonally related conditions.
Pneumococcal vaccine targeting strategy for older adults: customized risk profiling.

PubMed

Balicer, Ran D; Cohen, Chandra J; Leibowitz, Morton; Feldman, Becca S; Brufman, Ilan; Roberts, Craig; Hoshen, Moshe

2014-02-12

Current pneumococcal vaccine campaigns take a broad, primarily age-based approach to immunization targeting, overlooking many clinical and administrative considerations necessary in disease prevention and resource planning for specific patient populations. We aim to demonstrate the utility of a population-specific predictive model for hospital-treated pneumonia to direct effective vaccine targeting. Data was extracted for 1,053,435 members of an Israeli HMO, age 50 and older, during the study period 2008-2010. We developed and validated a logistic regression model to predict hospital-treated pneumonia using training and test samples, including a set of standard and population-specific risk factors. The model's predictive value was tested for prospectively identifying cases of pneumonia and invasive pneumococcal disease (IPD), and was compared to the existing international paradigm for patient immunization targeting. In a multivariate regression, age, co-morbidity burden and previous pneumonia events were most strongly positively associated with hospital-treated pneumonia. The model predicting hospital-treated pneumonia yielded a c-statistic of 0.80. Utilizing the predictive model, the top 17% highest-risk within the study validation population were targeted to detect 54% of those members who were subsequently treated for hospitalized pneumonia in the follow up period. The high-risk population identified through this model included 46% of the follow-up year's IPD cases, and 27% of community-treated pneumonia cases. These outcomes were compared with international guidelines for risk for pneumococcal diseases that accurately identified only 35% of hospitalized pneumonia, 41% of IPD cases and 21% of community-treated pneumonia. We demonstrate that a customized model for vaccine targeting performs better than international guidelines, and therefore, risk modeling may allow for more precise vaccine targeting and resource allocation than current national and international guidelines. Health care managers and policy-makers may consider the strategic potential of utilizing clinical and administrative databases for creating population-specific risk prediction models to inform vaccination campaigns. Copyright © 2013 Elsevier Ltd. All rights reserved.
User-Friendly Predictive Modeling of Greenhouse Gas (GHG) Fluxes and Carbon Storage in Tidal Wetlands

NASA Astrophysics Data System (ADS)

Ishtiaq, K. S.; Abdul-Aziz, O. I.

2015-12-01

We developed user-friendly empirical models to predict instantaneous fluxes of CO2 and CH4 from coastal wetlands based on a small set of dominant hydro-climatic and environmental drivers (e.g., photosynthetically active radiation, soil temperature, water depth, and soil salinity). The dominant predictor variables were systematically identified by applying a robust data-analytics framework on a wide range of possible environmental variables driving wetland greenhouse gas (GHG) fluxes. The method comprised of a multi-layered data-analytics framework, including Pearson correlation analysis, explanatory principal component and factor analyses, and partial least squares regression modeling. The identified dominant predictors were finally utilized to develop power-law based non-linear regression models to predict CO2 and CH4 fluxes under different climatic, land use (nitrogen gradient), tidal hydrology and salinity conditions. Four different tidal wetlands of Waquoit Bay, MA were considered as the case study sites to identify the dominant drivers and evaluate model performance. The study sites were dominated by native Spartina Alterniflora and characterized by frequent flooding and high saline conditions. The model estimated the potential net ecosystem carbon balance (NECB) both in gC/m2 and metric tonC/hectare by up-scaling the instantaneous predicted fluxes to the growing season and accounting for the lateral C flux exchanges between the wetlands and estuary. The entire model was presented in a single Excel spreadsheet as a user-friendly ecological engineering tool. The model can aid the development of appropriate GHG offset protocols for setting monitoring plans for tidal wetland restoration and maintenance projects. The model can also be used to estimate wetland GHG fluxes and potential carbon storage under various IPCC climate change and sea level rise scenarios; facilitating an appropriate management of carbon stocks in tidal wetlands and their incorporation into a potential carbon market.
Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology.

PubMed

Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta

2017-09-19

Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.
Interpretation of commonly used statistical regression models.

PubMed

Kasza, Jessica; Wolfe, Rory

2014-01-01

A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Spatial analysis of relative humidity during ungauged periods in a mountainous region

NASA Astrophysics Data System (ADS)

Um, Myoung-Jin; Kim, Yeonjoo

2017-08-01

Although atmospheric humidity influences environmental and agricultural conditions, thereby influencing plant growth, human health, and air pollution, efforts to develop spatial maps of atmospheric humidity using statistical approaches have thus far been limited. This study therefore aims to develop statistical approaches for inferring the spatial distribution of relative humidity (RH) for a mountainous island, for which data are not uniformly available across the region. A multiple regression analysis based on various mathematical models was used to identify the optimal model for estimating monthly RH by incorporating not only temperature but also location and elevation. Based on the regression analysis, we extended the monthly RH data from weather stations to cover the ungauged periods when no RH observations were available. Then, two different types of station-based data, the observational data and the data extended via the regression model, were used to form grid-based data with a resolution of 100 m. The grid-based data that used the extended station-based data captured the increasing RH trend along an elevation gradient. Furthermore, annual RH values averaged over the regions were examined. Decreasing temporal trends were found in most cases, with magnitudes varying based on the season and region.

Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.

PubMed

Samoli, Evangelia; Butland, Barbara K

2017-12-01

Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.
Technology diffusion in hospitals: a log odds random effects regression model.

PubMed

Blank, Jos L T; Valdmanis, Vivian G

2015-01-01

This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the diffusion of technologies. We distinguish a number of determinants, such as service, physician, and environmental, financial and organizational characteristics of the 60 Dutch hospitals in our sample. On the basis of this data set on Dutch general hospitals over the period 1995-2002, we conclude that there is a relation between a number of determinants and the diffusion of innovations underlining conclusions from earlier research. Positive effects were found on the basis of the size of the hospitals, competition and a hospital's commitment to innovation. It appears that if a policy is developed to further diffuse innovations, the external effects of demand and market competition need to be examined, which would de facto lead to an efficient use of technology. For the individual hospital, instituting an innovations office appears to be the most prudent course of action. © 2013 The Authors. International Journal of Health Planning and Management published by John Wiley & Sons, Ltd.
A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

PubMed Central

Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

2016-01-01

Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. PMID:26964095
A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models.

PubMed

Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S

2016-01-01

Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.
Polynomial order selection in random regression models via penalizing adaptively the likelihood.

PubMed

Corrales, J D; Munilla, S; Cantet, R J C

2015-08-01

Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates. © 2015 Blackwell Verlag GmbH.
Correlates of strength training in older rural African American and Caucasian women.

PubMed

Bopp, Melissa; Wilcox, Sara; Oberrecht, Larissa; Kammermann, Sandra; McElmurray, Charles T

2004-01-01

This study examined factors influencing strength training (ST) in two convenience samples of older rural women. Focus group (FG) participants were 23 Caucasian and 16 African American women aged 67.5 +/- 9.2 years. Survey participants were 60 Caucasian and 42 African American women, aged 70.59 +/- 9.21 years. FG participants answered questions about the risks, benefits, and barriers to ST. Survey participants completed measures of demographics, physical activity (including ST), depression and stress, decisional balance for exercise (DBE), barriers to PA, and social support (SS). Regression modeling examined correlates of ST. FG participants identified physical health gains and improved appearance as ST benefits. African American women also included mental health benefits and "feeling good". Both Caucasian and African American groups named physical health problems as risks of ST. Caucasian women identified time constraints, lack of ST knowledge, physical health problems, lack of exercise facilities, and the cost of ST as barriers. African American women cited being "too tired", physical health problems, lack of support, and other family and work responsibilities. The linear regression model explained 23.2% of the variance in hours per week of ST; DBE and family SS were independent positive correlates. This study identified correlates to participation in ST in older rural women and provides a basis for developing ST interventions in this population.
A Novel Degradation Identification Method for Wind Turbine Pitch System

NASA Astrophysics Data System (ADS)

Guo, Hui-Dong

2018-04-01

It’s difficult for traditional threshold value method to identify degradation of operating equipment accurately. An novel degradation evaluation method suitable for wind turbine condition maintenance strategy implementation was proposed in this paper. Based on the analysis of typical variable-speed pitch-to-feather control principle and monitoring parameters for pitch system, a multi input multi output (MIMO) regression model was applied to pitch system, where wind speed, power generation regarding as input parameters, wheel rotation speed, pitch angle and motor driving currency for three blades as output parameters. Then, the difference between the on-line measurement and the calculated value from the MIMO regression model applying least square support vector machines (LSSVM) method was defined as the Observed Vector of the system. The Gaussian mixture model (GMM) was applied to fitting the distribution of the multi dimension Observed Vectors. Applying the model established, the Degradation Index was calculated using the SCADA data of a wind turbine damaged its pitch bearing retainer and rolling body, which illustrated the feasibility of the provided method.
Genotype-phenotype association study via new multi-task learning model

PubMed Central

Huo, Zhouyuan; Shen, Dinggang

2018-01-01

Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
Modeling the prediction of business intelligence system effectiveness.

PubMed

Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I

2016-01-01

Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.
A CONCISE PANEL OF BIOMARKERS IDENTIFIES NEUROCOGNITIVE FUNCTIONING CHANGES IN HIV-INFECTED INDIVIDUALS

PubMed Central

Marcotte, Thomas D.; Deutsch, Reena; Michael, Benedict Daniel; Franklin, Donald; Cookson, Debra Rosario; Bharti, Ajay R.; Grant, Igor; Letendre, Scott L.

2013-01-01

Background Neurocognitive (NC) impairment (NCI) occurs commonly in people living with HIV. Despite substantial effort, no biomarkers have been sufficiently validated for diagnosis and prognosis of NCI in the clinic. The goal of this project was to identify diagnostic or prognostic biomarkers for NCI in a comprehensively characterized HIV cohort. Methods Multidisciplinary case review selected 98 HIV-infected individuals and categorized them into four NC groups using normative data: stably normal (SN), stably impaired (SI), worsening (Wo), or improving (Im). All subjects underwent comprehensive NC testing, phlebotomy, and lumbar puncture at two timepoints separated by a median of 6.2 months. Eight biomarkers were measured in CSF and blood by immunoassay. Results were analyzed using mixed model linear regression and staged recursive partitioning. Results At the first visit, subjects were mostly middle-aged (median 45) white (58%) men (84%) who had AIDS (70%). Of the 73% who took antiretroviral therapy (ART), 54% had HIV RNA levels below 50 c/mL in plasma. Mixed model linear regression identified that only MCP-1 in CSF was associated with neurocognitive change group. Recursive partitioning models aimed at diagnosis (i.e., correctly classifying neurocognitive status at the first visit) were complex and required most biomarkers to achieve misclassification limits. In contrast, prognostic models were more efficient. A combination of three biomarkers (sCD14, MCP-1, SDF-1α) correctly classified 82% of Wo and SN subjects, including 88% of SN subjects. A combination of two biomarkers (MCP-1, TNF-α) correctly classified 81% of Im and SI subjects, including 100% of SI subjects. Conclusions This analysis of well-characterized individuals identified concise panels of biomarkers associated with NC change. Across all analyses, the two most frequently identified biomarkers were sCD14 and MCP-1, indicators of monocyte/macrophage activation. While the panels differed depending on the outcome and on the degree of misclassification, nearly all stable patients were correctly classified. PMID:24101401
Risk factors and mediating pathways of loneliness and social support in community-dwelling older adults.

PubMed

Schnittger, Rebecca I B; Wherton, Joseph; Prendergast, David; Lawlor, Brian A

2012-01-01

To develop biopsychosocial models of loneliness and social support thereby identifying their key risk factors in an Irish sample of community-dwelling older adults. Additionally, to investigate indirect effects of social support on loneliness through mediating risk factors. A total of 579 participants (400 females; 179 males) were given a battery of biopsychosocial assessments with the primary measures being the De Jong Gierveld Loneliness Scale and the Lubben Social Network Scale along with a broad range of secondary measures. Bivariate correlation analyses identified items to be included in separate psychosocial, cognitive, biological and demographic multiple regression analyses. The resulting model items were then entered into further multiple regression analyses to obtain overall models. Following this, bootstrapping mediation analyses was conducted to examine indirect effects of social support on the subtypes (emotional and social) of loneliness. The overall model for (1) emotional loneliness included depression, neuroticism, perceived stress, living alone and accommodation type, (2) social loneliness included neuroticism, perceived stress, animal naming and number of grandchildren and (3) social support included extraversion, executive functioning (Trail Making Test B-time), history of falls, age and whether the participant drives or not. Social support influenced emotional loneliness predominantly through indirect means, while its effect on social loneliness was more direct. These results characterise the biopsychosocial risk factors of emotional loneliness, social loneliness and social support and identify key pathways by which social support influences emotional and social loneliness. These findings highlight issues with the potential for consideration in the development of targeted interventions.
Individualized Risk Model for Venous Thromboembolism After Total Joint Arthroplasty.

PubMed

Parvizi, Javad; Huang, Ronald; Rezapoor, Maryam; Bagheri, Behrad; Maltenfort, Mitchell G

2016-09-01

Venous thromboembolism (VTE) after total joint arthroplasty (TJA) is a potentially fatal complication. Currently, a standard protocol for postoperative VTE prophylaxis is used that makes little distinction between patients at varying risks of VTE. We sought to develop a simple scoring system identifying patients at higher risk for VTE in whom more potent anticoagulation may need to be administered. Utilizing the National Inpatient Sample data, 1,721,806 patients undergoing TJA were identified, among whom 15,775 (0.9%) developed VTE after index arthroplasty. Among the cohort, all known potential risk factors for VTE were assessed. An initial logistic regression model using potential predictors for VTE was performed. Predictors with little contribution or poor predictive power were pruned from the data, and the model was refit. After pruning of variables that had little to no contribution to VTE risk, using the logistic regression, all independent predictors of VTE after TJA were identified in the data. Relative weights for each factor were determined. Hypercoagulability, metastatic cancer, stroke, sepsis, and chronic obstructive pulmonary disease had some of the highest points. Patients with any of these conditions had risk for postoperative VTE that exceeded the 3% rate. Based on the model, an iOS (iPhone operating system) application was developed (VTEstimator) that could be used to assign patients into low or high risk for VTE after TJA. We believe individualization of VTE prophylaxis after TJA can improve the efficacy of preventing VTE while minimizing untoward risks associated with the administration of anticoagulation. Copyright © 2016 Elsevier Inc. All rights reserved.
Semiparametric Identification of Human Arm Dynamics for Flexible Control of a Functional Electrical Stimulation Neuroprosthesis

PubMed Central

Schearer, Eric M.; Liao, Yu-Wei; Perreault, Eric J.; Tresch, Matthew C.; Memberg, William D.; Kirsch, Robert F.; Lynch, Kevin M.

2016-01-01

We present a method to identify the dynamics of a human arm controlled by an implanted functional electrical stimulation neuroprosthesis. The method uses Gaussian process regression to predict shoulder and elbow torques given the shoulder and elbow joint positions and velocities and the electrical stimulation inputs to muscles. We compare the accuracy of torque predictions of nonparametric, semiparametric, and parametric model types. The most accurate of the three model types is a semiparametric Gaussian process model that combines the flexibility of a black box function approximator with the generalization power of a parameterized model. The semiparametric model predicted torques during stimulation of multiple muscles with errors less than 20% of the total muscle torque and passive torque needed to drive the arm. The identified model allows us to define an arbitrary reaching trajectory and approximately determine the muscle stimulations required to drive the arm along that trajectory. PMID:26955041
PREOPERATIVE MRI IMPROVES PREDICTION OF EXTENSIVE OCCULT AXILLARY LYMPH NODE METASTASES IN BREAST CANCER PATIENTS WITH A POSITIVE SENTINEL LYMPH NODE BIOPSY

PubMed Central

Loiselle, Christopher; Eby, Peter R.; Kim, Janice N.; Calhoun, Kristine E.; Allison, Kimberly H.; Gadi, Vijayakrishna K.; Peacock, Sue; Storer, Barry; Mankoff, David A.; Partridge, Savannah C.; Lehman, Constance D.

2014-01-01

Rationale and Objectives To test the ability of quantitative measures from preoperative Dynamic Contrast Enhanced MRI (DCE-MRI) to predict, independently and/or with the Katz pathologic nomogram, which breast cancer patients with a positive sentinel lymph node biopsy will have ≥ 4 positive axillary lymph nodes upon completion axillary dissection. Methods and Materials A retrospective review was conducted to identify clinically node-negative invasive breast cancer patients who underwent preoperative DCE-MRI, followed by sentinel node biopsy with positive findings and complete axillary dissection (6/2005 – 1/2010). Clinical/pathologic factors, primary lesion size and quantitative DCE-MRI kinetics were collected from clinical records and prospective databases. DCE-MRI parameters with univariate significance (p < 0.05) to predict ≥ 4 positive axillary nodes were modeled with stepwise regression and compared to the Katz nomogram alone and to a combined MRI-Katz nomogram model. Results Ninety-eight patients with 99 positive sentinel biopsies met study criteria. Stepwise regression identified DCE-MRI total persistent enhancement and volume adjusted peak enhancement as significant predictors of ≥4 metastatic nodes. Receiver operating characteristic (ROC) curves demonstrated an area under the curve (AUC) of 0.78 for the Katz nomogram, 0.79 for the DCE-MRI multivariate model, and 0.87 for the combined MRI-Katz model. The combined model was significantly more predictive than the Katz nomogram alone (p = 0.003). Conclusion Integration of DCE-MRI primary lesion kinetics significantly improved the Katz pathologic nomogram accuracy to predict presence of metastases in ≥ 4 nodes. DCE-MRI may help identify sentinel node positive patients requiring further localregional therapy. PMID:24331270
Billing code algorithms to identify cases of peripheral artery disease from administrative data

PubMed Central

Fan, Jin; Arruda-Olson, Adelaide M; Leibson, Cynthia L; Smith, Carin; Liu, Guanghui; Bailey, Kent R; Kullo, Iftikhar J

2013-01-01

Objective To construct and validate billing code algorithms for identifying patients with peripheral arterial disease (PAD). Methods We extracted all encounters and line item details including PAD-related billing codes at Mayo Clinic Rochester, Minnesota, between July 1, 1997 and June 30, 2008; 22 712 patients evaluated in the vascular laboratory were divided into training and validation sets. Multiple logistic regression analysis was used to create an integer code score from the training dataset, and this was tested in the validation set. We applied a model-based code algorithm to patients evaluated in the vascular laboratory and compared this with a simpler algorithm (presence of at least one of the ICD-9 PAD codes 440.20–440.29). We also applied both algorithms to a community-based sample (n=4420), followed by a manual review. Results The logistic regression model performed well in both training and validation datasets (c statistic=0.91). In patients evaluated in the vascular laboratory, the model-based code algorithm provided better negative predictive value. The simpler algorithm was reasonably accurate for identification of PAD status, with lesser sensitivity and greater specificity. In the community-based sample, the sensitivity (38.7% vs 68.0%) of the simpler algorithm was much lower, whereas the specificity (92.0% vs 87.6%) was higher than the model-based algorithm. Conclusions A model-based billing code algorithm had reasonable accuracy in identifying PAD cases from the community, and in patients referred to the non-invasive vascular laboratory. The simpler algorithm had reasonable accuracy for identification of PAD in patients referred to the vascular laboratory but was significantly less sensitive in a community-based sample. PMID:24166724
Coping Styles in Heart Failure Patients with Depressive Symptoms

PubMed Central

Trivedi, Ranak B.; Blumenthal, James A.; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Sueta-Dupree, Carla; Johnson, Kristy; Sherwood, Andrew

2009-01-01

Objective Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. Methods 222 stable HF patients (32.75% female, 45.4% non-Hispanic Black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 versus BDI≥10, to identify coping styles accompanying clinically significant depressive symptoms. Results In linear regression models, higher BDI scores were associated with lower scores on the acceptance (β=-.14), humor (β=-.15), planning (β=-.15), and emotional support (β=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (β=.41), denial (β=.33), venting (β=.25), and mental disengagement (β=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (β=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (β=.39, p<.001). In logistical regression models, BDI≥10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Conclusion Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms. PMID:19773027
Coping styles in heart failure patients with depressive symptoms.

PubMed

Trivedi, Ranak B; Blumenthal, James A; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Dupree, Carla; Johnson, Kristy; Sherwood, Andrew

2009-10-01

Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. A total of 222 stable HF patients (32.75% female, 45.4% non-Hispanic black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 vs. BDI> or =10, to identify coping styles accompanying clinically significant depressive symptoms. In linear regression models, higher BDI scores were associated with lower scores on the acceptance (beta=-.14), humor (beta=-.15), planning (beta=-.15), and emotional support (beta=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (beta=.41), denial (beta=.33), venting (beta=.25), and mental disengagement (beta=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (beta=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (beta=.39, P<.001). In logistical regression models, BDI> or =10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms.
Assessing NARCCAP climate model effects using spatial confidence regions

PubMed Central

French, Joshua P.; McGinnis, Seth; Schwartzman, Armin

2017-01-01

We assess similarities and differences between model effects for the North American Regional Climate Change Assessment Program (NARCCAP) climate models using varying classes of linear regression models. Specifically, we consider how the average temperature effect differs for the various global and regional climate model combinations, including assessment of possible interaction between the effects of global and regional climate models. We use both pointwise and simultaneous inference procedures to identify regions where global and regional climate model effects differ. We also show conclusively that results from pointwise inference are misleading, and that accounting for multiple comparisons is important for making proper inference. PMID:28936474
A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

PubMed

Watanabe, Hiroyuki; Miyazaki, Hiroyasu

2006-01-01

Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes

PubMed Central

Seo, Minseok; Shin, Su-kyung; Kwon, Eun-Young; Kim, Sung-Eun; Bae, Yun-Jung; Lee, Seungyeoun; Sung, Mi-Kyung; Choi, Myung-Sook; Park, Taesung

2016-01-01

Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of data from experimental microarrays and simulation studies, the proposed model-based approach was shown to provide a more powerful result than the naïve approach and the hierarchical approach. Since our approach is model-based, it is very flexible and can easily handle different types of covariates. PMID:26964035

Inter-model comparison of the landscape determinants of vector-borne disease: implications for epidemiological and entomological risk modeling.

PubMed

Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V

2014-01-01

Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.
Role of the Egami score to predict immunoglobulin resistance in Kawasaki disease among a Western Mediterranean population.

PubMed

Sánchez-Manubens, Judith; Antón, Jordi; Bou, Rosa; Iglesias, Estíbaliz; Calzada-Hernandez, Joan; Borlan, Sergi; Gimenez-Roca, Clara; Rivera, Josefa

2016-07-01

Kawasaki disease is an acute self-limited systemic vasculitis common in childhood. Intravenous immunoglobulin (IVIG) is an effective treatment, and it reduces the incidence of cardiac complications. Egami score has been validated to identify IVIG non-responder patients in Japanese population, and it has shown high sensitivity and specificity to identify these non-responder patients. Although its effectiveness in Japan, Egami score has shown to be ineffective in non-Japanese populations. The aim of this study was to apply the Egami score in a Western Mediterranean population in Catalonia (Spain). Observational population-based study that includes patients from all Pediatric Units in 33 Catalan hospitals, both public and private management, between January 2004 and March 2014. Sensitivity and specificity for the Egami score was calculated, and a logistic regression analysis of predictors of overall response to IVIG was also developed. Predicting IVIG resistance with a cutoff for Egami score ≥3 obtained 26 % sensitivity and 82 % specificity. Negative predictive value was 85 % and positive predictive value 22 %. This low sensitivity implies that three out of four non-responders will not be identified by the Egami score. Besides, logistic regression models did not found significance for the use of the Egami score to predict IVIG resistance in Catalan population although having an area under the ROC curve of 0.618 (IC 95 % 0.538-0.698, p < 0.001). Although regression models found an area under the ROC curve >0.5 to predict IVIG resistance, the low sensitivity excludes the Egami score as a useful tool to predict IVIG resistance in Catalan population.
Development of the Sydney Falls Risk Screening Tool in brain injury rehabilitation: A multisite prospective cohort study.

PubMed

McKechnie, Duncan; Fisher, Murray J; Pryor, Julie; Bonser, Melissa; Jesus, Jhoven De

2018-03-01

To develop a falls risk screening tool (FRST) sensitive to the traumatic brain injury rehabilitation population. Falls are the most frequently recorded patient safety incident within the hospital context. The inpatient traumatic brain injury rehabilitation population is one particular population that has been identified as at high risk of falls. However, no FRST has been developed for this patient population. Consequently in the traumatic brain injury rehabilitation population, there is the real possibility that nurses are using falls risk screening tools that have a poor clinical utility. Multisite prospective cohort study. Univariate and multiple logistic regression modelling techniques (backward elimination, elastic net and hierarchical) were used to examine each variable's association with patients who fell. The resulting FRST's clinical validity was examined. Of the 140 patients in the study, 41 (29%) fell. Through multiple logistic regression modelling, 11 variables were identified as predictors for falls. Using hierarchical logistic regression, five of these were identified for inclusion in the resulting falls risk screening tool: prescribed mobility aid (such as, wheelchair or frame), a fall since admission to hospital, impulsive behaviour, impaired orientation and bladder and/or bowel incontinence. The resulting FRST has good clinical validity (sensitivity = 0.9; specificity = 0.62; area under the curve = 0.87; Youden index = 0.54). The tool was significantly more accurate (p = .037 on DeLong test) in discriminating fallers from nonfallers than the Ontario Modified STRATIFY FRST. A FRST has been developed using a comprehensive statistical framework, and evidence has been provided of this tool's clinical validity. The developed tool, the Sydney Falls Risk Screening Tool, should be considered for use in brain injury rehabilitation populations. © 2017 John Wiley & Sons Ltd.
Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

PubMed

Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

2017-06-01

Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
An integrated simulation and optimization approach for managing human health risks of atmospheric pollutants by coal-fired power plants.

PubMed

Dai, C; Cai, X H; Cai, Y P; Guo, H C; Sun, W; Tan, Q; Huang, G H

2014-06-01

This research developed a simulation-aided nonlinear programming model (SNPM). This model incorporated the consideration of pollutant dispersion modeling, and the management of coal blending and the related human health risks within a general modeling framework In SNPM, the simulation effort (i.e., California puff [CALPUFF]) was used to forecast the fate of air pollutants for quantifying the health risk under various conditions, while the optimization studies were to identify the optimal coal blending strategies from a number of alternatives. To solve the model, a surrogate-based indirect search approach was proposed, where the support vector regression (SVR) was used to create a set of easy-to-use and rapid-response surrogates for identifying the function relationships between coal-blending operating conditions and health risks. Through replacing the CALPUFF and the corresponding hazard quotient equation with the surrogates, the computation efficiency could be improved. The developed SNPM was applied to minimize the human health risk associated with air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicated that it could be used for reducing the health risk of the public in the vicinity of the two power plants, identifying desired coal blending strategies for decision makers, and considering a proper balance between coal purchase cost and human health risk. A simulation-aided nonlinear programming model (SNPM) is developed. It integrates the advantages of CALPUFF and nonlinear programming model. To solve the model, a surrogate-based indirect search approach based on the combination of support vector regression and genetic algorithm is proposed. SNPM is applied to reduce the health risk caused by air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicate that it is useful for generating coal blending schemes, reducing the health risk of the public, reflecting the trade-offbetween coal purchase cost and health risk.
Modified Regression Correlation Coefficient for Poisson Regression Model

NASA Astrophysics Data System (ADS)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Do classic blood biomarkers of JSLE identify active lupus nephritis? Evidence from the UK JSLE Cohort Study.

PubMed

Smith, E M D; Jorgensen, A L; Beresford, M W

2017-10-01

Background Lupus nephritis (LN) affects up to 80% of juvenile-onset systemic lupus erythematosus (JSLE) patients. The value of commonly available biomarkers, such as anti-dsDNA antibodies, complement (C3/C4), ESR and full blood count parameters in the identification of active LN remains uncertain. Methods Participants from the UK JSLE Cohort Study, aged <16 years at diagnosis, were categorized as having active or inactive LN according to the renal domain of the British Isles Lupus Assessment Group score. Classic biomarkers: anti-dsDNA, C3, C4, ESR, CRP, haemoglobin, total white cells, neutrophils, lymphocytes, platelets and immunoglobulins were assessed for their ability to identify active LN using binary logistic regression modeling, with stepAIC function applied to select a final model. Receiver-operating curve analysis was used to assess diagnostic accuracy. Results A total of 370 patients were recruited; 191 (52%) had active LN and 179 (48%) had inactive LN. Binary logistic regression modeling demonstrated a combination of ESR, C3, white cell count, neutrophils, lymphocytes and IgG to be best for the identification of active LN (area under the curve 0.724). Conclusions At best, combining common classic blood biomarkers of lupus activity using multivariate analysis provides a 'fair' ability to identify active LN. Urine biomarkers were not included in these analyses. These results add to the concern that classic blood biomarkers are limited in monitoring discrete JSLE manifestations such as LN.
Factors associated with HIV/AIDS treatment dropouts in a special care unit in the City of Rio de Janeiro, RJ, Brazil.

PubMed

Schilkowsky, Louise Bastos; Portela, Margareth Crisóstomo; Sá, Marilene de Castilho

2011-06-01

This study aimed to identify factors associated with the health care of patients with HIV/AIDS who drop out. The study was developed in a specialized health care unit of a University hospital in Rio de Janeiro, Brazil, considering a stratified sample of adult patients including all dropout cases (155) and 44.0% of 790 cases under regular follow-up. Bivariate analyses were used to identify associations between health care dropout and demographic, socioeconomic and clinical variables. Logistic and Cox regression models were used to identify the independent effects of the explanatory variables on risk for dropout, in the latter by incorporating information on the outcome over time. Patients were, on average, 35 years old, predominantly males (66.4%) and of a low socioeconomic level (45.0%). In both models, health care dropout was consistently associated with being unemployed or having an unstable job, using illicit drugs and having psychiatric background--positive association; and with age, having AIDS, and having used multiple antiretroviral regimens--negative association. In the logistic regression, dropping out was also positively associated with time between diagnosis and the first outpatient visit, while in the Cox model, the hazard for dropping out was positively associated with being single, and negatively associated with a higher educational level. The results of this work allow for the identification of HIV/AIDS patients more likely to drop out from health care.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

NASA Astrophysics Data System (ADS)

Yadav, B.; Hatfield, K.

2017-12-01

We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Estimated Perennial Streams of Idaho and Related Geospatial Datasets

USGS Publications Warehouse

Rea, Alan; Skinner, Kenneth D.

2009-01-01

The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of record, generally would be considered to represent flow conditions better at a given site than flow estimates based on regionalized regression models. The geospatial datasets of modeled perennial streams are considered a first-cut estimate, and should not be construed to override site-specific flow data.
Identifying the Safety Factors over Traffic Signs in State Roads using a Panel Quantile Regression Approach.

PubMed

Šarić, Željko; Xu, Xuecai; Duan, Li; Babić, Darko

2018-06-20

This study intended to investigate the interactions between accident rate and traffic signs in state roads located in Croatia, and accommodate the heterogeneity attributed to unobserved factors. The data from 130 state roads between 2012 and 2016 were collected from Traffic Accident Database System maintained by the Republic of Croatia Ministry of the Interior. To address the heterogeneity, a panel quantile regression model was proposed, in which quantile regression model offers a more complete view and a highly comprehensive analysis of the relationship between accident rate and traffic signs, while the panel data model accommodates the heterogeneity attributed to unobserved factors. Results revealed that (1) low visibility of material damage (MD) and death or injured (DI) increased the accident rate; (2) the number of mandatory signs and the number of warning signs were more likely to reduce the accident rate; (3)average speed limit and the number of invalid traffic signs per km exhibited a high accident rate. To our knowledge, it's the first attempt to analyze the interactions between accident consequences and traffic signs by employing a panel quantile regression model; by involving the visibility, the present study demonstrates that the low visibility causes a relatively higher risk of MD and DI; It is noteworthy that average speed limit corresponds with accident rate positively; The number of mandatory signs and the number of warning signs are more likely to reduce the accident rate; The number of invalid traffic signs per km are significant for accident rate, thus regular maintenance should be kept for a safer roadway environment.
Assessing the Multidimensional Relationship Between Medication Beliefs and Adherence in Older Adults With Hypertension Using Polynomial Regression.

PubMed

Dillon, Paul; Phillips, L Alison; Gallagher, Paul; Smith, Susan M; Stewart, Derek; Cousins, Gráinne

2018-02-05

The Necessity-Concerns Framework (NCF) is a multidimensional theory describing the relationship between patients' positive and negative evaluations of their medication which interplay to influence adherence. Most studies evaluating the NCF have failed to account for the multidimensional nature of the theory, placing the separate dimensions of medication "necessity beliefs" and "concerns" onto a single dimension (e.g., the Beliefs about Medicines Questionnaire-difference score model). To assess the multidimensional effect of patient medication beliefs (concerns and necessity beliefs) on medication adherence using polynomial regression with response surface analysis. Community-dwelling older adults >65 years (n = 1,211) presenting their own prescription for antihypertensive medication to 106 community pharmacies in the Republic of Ireland rated their concerns and necessity beliefs to antihypertensive medications at baseline and their adherence to antihypertensive medication at 12 months via structured telephone interview. Confirmatory polynomial regression found the difference-score model to be inaccurate; subsequent exploratory analysis identified a quadratic model to be the best-fitting polynomial model. Adherence was lowest among those with strong medication concerns and weak necessity beliefs, and adherence was greatest for those with weak concerns and strong necessity beliefs (slope β = -0.77, p<.001; curvature β = -0.26, p = .004). However, novel nonreciprocal effects were also observed; patients with simultaneously high concerns and necessity beliefs had lower adherence than those with simultaneously low concerns and necessity beliefs (slope β = -0.36, p = .004; curvature β = -0.25, p = .003). The difference-score model fails to account for the potential nonreciprocal effects. Results extend evidence supporting the use of polynomial regression to assess the multidimensional effect of medication beliefs on adherence.
Establishing endangered species recovery criteria using predictive simulation modeling

USGS Publications Warehouse

McGowan, Conor P.; Catlin, Daniel H.; Shaffer, Terry L.; Gratto-Trevor, Cheri L.; Aron, Carol

2014-01-01

Listing a species under the Endangered Species Act (ESA) and developing a recovery plan requires U.S. Fish and Wildlife Service to establish specific and measurable criteria for delisting. Generally, species are listed because they face (or are perceived to face) elevated risk of extinction due to issues such as habitat loss, invasive species, or other factors. Recovery plans identify recovery criteria that reduce extinction risk to an acceptable level. It logically follows that the recovery criteria, the defined conditions for removing a species from ESA protections, need to be closely related to extinction risk. Extinction probability is a population parameter estimated with a model that uses current demographic information to project the population into the future over a number of replicates, calculating the proportion of replicated populations that go extinct. We simulated extinction probabilities of piping plovers in the Great Plains and estimated the relationship between extinction probability and various demographic parameters. We tested the fit of regression models linking initial abundance, productivity, or population growth rate to extinction risk, and then, using the regression parameter estimates, determined the conditions required to reduce extinction probability to some pre-defined acceptable threshold. Binomial regression models with mean population growth rate and the natural log of initial abundance were the best predictors of extinction probability 50 years into the future. For example, based on our regression models, an initial abundance of approximately 2400 females with an expected mean population growth rate of 1.0 will limit extinction risk for piping plovers in the Great Plains to less than 0.048. Our method provides a straightforward way of developing specific and measurable recovery criteria linked directly to the core issue of extinction risk. Published by Elsevier Ltd.
Multivariate logistic regression for predicting total culturable virus presence at the intake of a potable-water treatment plant: novel application of the atypical coliform/total coliform ratio.

PubMed

Black, L E; Brion, G M; Freitas, S J

2007-06-01

Predicting the presence of enteric viruses in surface waters is a complex modeling problem. Multiple water quality parameters that indicate the presence of human fecal material, the load of fecal material, and the amount of time fecal material has been in the environment are needed. This paper presents the results of a multiyear study of raw-water quality at the inlet of a potable-water plant that related 17 physical, chemical, and biological indices to the presence of enteric viruses as indicated by cytopathic changes in cell cultures. It was found that several simple, multivariate logistic regression models that could reliably identify observations of the presence or absence of total culturable virus could be fitted. The best models developed combined a fecal age indicator (the atypical coliform [AC]/total coliform [TC] ratio), the detectable presence of a human-associated sterol (epicoprostanol) to indicate the fecal source, and one of several fecal load indicators (the levels of Giardia species cysts, coliform bacteria, and coprostanol). The best fit to the data was found when the AC/TC ratio, the presence of epicoprostanol, and the density of fecal coliform bacteria were input into a simple, multivariate logistic regression equation, resulting in 84.5% and 78.6% accuracies for the identification of the presence and absence of total culturable virus, respectively. The AC/TC ratio was the most influential input variable in all of the models generated, but producing the best prediction required additional input related to the fecal source and the fecal load. The potential for replacing microbial indicators of fecal load with levels of coprostanol was proposed and evaluated by multivariate logistic regression modeling for the presence and absence of virus.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye

PubMed Central

Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael

2017-01-01

Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
qFeature

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-09-14

This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
Investigation using data from ERTS to develop and implement utilization of living marine resources

NASA Technical Reports Server (NTRS)

Stevenson, W. H. (Principal Investigator); Pastula, E. J., Jr.

1973-01-01

The author has identified the following significant results. The feasibility of utilizing ERTS-1 data in conjunction with aerial remote sensing and sea truth information to predict the distribution of menhaden in the Mississippi Sound during a specific time frame has been demonstrated by employing a number of uniquely designed empirical regression models. The construction of these models was made possible through innovative statistical routines specifically developed to meet the stated objectives.
Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

PubMed Central

Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

2015-01-01

Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
Spatial analysis of land use and shallow groundwater vulnerability in the watershed adjacent to Assateague Island National Seashore, Maryland and Virginia, USA

USGS Publications Warehouse

LaMotte, A.E.; Greene, E.A.

2007-01-01

Spatial relations between land use and groundwater quality in the watershed adjacent to Assateague Island National Seashore, Maryland and Virginia, USA were analyzed by the use of two spatial models. One model used a logit analysis and the other was based on geostatistics. The models were developed and compared on the basis of existing concentrations of nitrate as nitrogen in samples from 529 domestic wells. The models were applied to produce spatial probability maps that show areas in the watershed where concentrations of nitrate in groundwater are likely to exceed a predetermined management threshold value. Maps of the watershed generated by logistic regression and probability kriging analysis showing where the probability of nitrate concentrations would exceed 3 mg/L (>0.50) compared favorably. Logistic regression was less dependent on the spatial distribution of sampled wells, and identified an additional high probability area within the watershed that was missed by probability kriging. The spatial probability maps could be used to determine the natural or anthropogenic factors that best explain the occurrence and distribution of elevated concentrations of nitrate (or other constituents) in shallow groundwater. This information can be used by local land-use planners, ecologists, and managers to protect water supplies and identify land-use planning solutions and monitoring programs in vulnerable areas. ?? 2006 Springer-Verlag.
External characteristic determination of eggs and cracked eggs identification using spectral signature

PubMed Central

Xie, Chuanqi; He, Yong

2016-01-01

This study was carried out to use hyperspectral imaging technique for determining color (L*, a* and b*) and eggshell strength and identifying cracked chicken eggs. Partial least squares (PLS) models based on full and selected wavelengths suggested by regression coefficient (RC) method were established to predict the four parameters, respectively. Partial least squares-discriminant analysis (PLS-DA) and RC-partial least squares-discriminant analysis (RC-PLS-DA) models were applied to identify cracked eggs. PLS models performed well with the correlation coefficient (rp) of 0.788 for L*, 0.810 for a*, 0.766 for b* and 0.835 for eggshell strength. RC-PLS models also obtained the rp of 0.771 for L*, 0.806 for a*, 0.767 for b* and 0.841 for eggshell strength. The classification results were 97.06% in PLS-DA model and 88.24% in RC-PLS-DA model. It demonstrated that hyperspectral imaging technique has the potential to be used to detect color and eggshell strength values and identify cracked chicken eggs. PMID:26882990

Deep learning for predicting the monsoon over the homogeneous regions of India

NASA Astrophysics Data System (ADS)

Saha, Moumita; Mitra, Pabitra; Nanjundiah, Ravi S.

2017-06-01

Indian monsoon varies in its nature over the geographical regions. Predicting the rainfall not just at the national level, but at the regional level is an important task. In this article, we used a deep neural network, namely, the stacked autoencoder to automatically identify climatic factors that are capable of predicting the rainfall over the homogeneous regions of India. An ensemble regression tree model is used for monsoon prediction using the identified climatic predictors. The proposed model provides forecast of the monsoon at a long lead time which supports the government to implement appropriate policies for the economic growth of the country. The monsoon of the central, north-east, north-west, and south-peninsular India regions are predicted with errors of 4.1%, 5.1%, 5.5%, and 6.4%, respectively. The identified predictors show high skill in predicting the regional monsoon having high variability. The proposed model is observed to be competitive with the state-of-the-art prediction models.
Sex Differences in Contraception Non-Use among Urban Adolescents: Risk Factors for Unintended Pregnancy

ERIC Educational Resources Information Center

Casola, Allison R.; Nelson, Deborah B.; Patterson, Freda

2017-01-01

Background: Contraception non-use among sexually active adolescents is a major cause of unintended pregnancy (UP). Methods: In this cross-sectional study we sought to identify overall and sex-specific correlates of contraception non-use using the 2015 Philadelphia Youth Risk Behavior Survey (YRBS) (N = 9540). Multivariate regression models were…
Mental Health Status, Drug Treatment Use, and Needle Sharing among Injection Drug Users

ERIC Educational Resources Information Center

Lundgren, Lena M.; Amodeo, Maryann; Chassler, Deborah

2005-01-01

This study examined the relationship among mental health symptoms, drug treatment use, and needle sharing in a sample of 507 injection drug users (IDUs). Mental health symptoms were measured through the ASI psychiatric scale. A logistic regression model identified that some of the ASI items were associated with needle sharing in an opposing…
Using multi-trait and random regression models to identify genetic variation in tolerance of pigs to Porcine Reproductive and Respiratory Syndrome virus

USDA-ARS?s Scientific Manuscript database

Background A host can adopt two response strategies to infection: resistance (reduce pathogen load) and tolerance (minimize impact of infection on performance). Both strategies may be under genetic control and could thus be targeted for genetic improvement. Although there is evidence in support of a...
Community-Based Addiction Treatment Staff Attitudes about the Usefulness of Evidence-Based Addiction Treatment and CBO Organizational Linkages to Research Institutions

ERIC Educational Resources Information Center

Lundgren, Lena; Krull, Ivy; Zerden, Lisa de Saxe; McCarty, Dennis

2011-01-01

This national study of community-based addiction-treatment organizations' (CBOs) implementation of evidence-based practices explored CBO Program Directors' (n = 296) and clinical staff (n = 518) attitudes about the usefulness of science-based addiction treatment. Through multivariable regression modeling, the study identified that identical…
The Influence of Finance and Accountability Policies on Location of New York State Charter Schools

ERIC Educational Resources Information Center

Bifulco, Robert; Buerger, Christian

2015-01-01

This article identifies a set of location incentives created by New York's charter school financing and accountability provisions. We then use regression models to examine the location of charter schools across and within districts. We find that charter schools (1) are significantly more likely to locate in districts with high operating expenses…
A Regression Model with a New Tool: IDB Analyzer for Identifying Factors Predicting Mathematics Performance Using PISA 2012 Indices

ERIC Educational Resources Information Center

Arikan, Serkan

2014-01-01

There are many studies that focus on factors affecting achievement. However, there is limited research that used student characteristics indices reported by the Programme for International Student Assessment (PISA). Therefore, this study investigated the predictive effects of student characteristics on mathematics performance of Turkish students.…
Escaping Poverty: Rural Low-Income Mothers' Opportunity to Pursue Post-Secondary Education

ERIC Educational Resources Information Center

Woodford, Michelle; Mammen, Sheila

2010-01-01

Using human capital theory, this paper identifies the factors that may affect the opportunity for rural low-income mothers to pursue post-secondary education or training in order to escape poverty. Dependent variables used in the logistic regression model included micro-level household variables as well as the effects of state-wide welfare…
Network Structure and Travel Time Perception

PubMed Central

Parthasarathi, Pavithra; Levinson, David; Hochmair, Hartwig

2013-01-01

The purpose of this research is to test the systematic variation in the perception of travel time among travelers and relate the variation to the underlying street network structure. Travel survey data from the Twin Cities metropolitan area (which includes the cities of Minneapolis and St. Paul) is used for the analysis. Travelers are classified into two groups based on the ratio of perceived and estimated commute travel time. The measures of network structure are estimated using the street network along the identified commute route. T-test comparisons are conducted to identify statistically significant differences in estimated network measures between the two traveler groups. The combined effect of these estimated network measures on travel time is then analyzed using regression models. The results from the t-test and regression analyses confirm the influence of the underlying network structure on the perception of travel time. PMID:24204932
Red-cockaded Woodpecker Picoides borealis Microhabitat Characteristics and Reproductive Success in a Loblolly-Shortleaf Pine Forest

USGS Publications Warehouse

Wood, Douglas R.; Burger, L. Wesley; Vilella, Francisco

2014-01-01

We investigated the relationship between red-cockaded woodpecker (Picoides borealis) reproductive success and microhabitat characteristics in a southeastern loblolly (Pinus taeda) and shortleaf (P. echinata) pine forest. From 1997 to 1999, we recorded reproductive success parameters of 41 red-cockaded woodpecker groups at the Bienville National Forest, Mississippi. Microhabitat characteristics were measured for each group during the nesting season. Logistic regression identified understory vegetation height and small nesting season home range size as predictors of red-cockaded woodpecker nest attempts. Linear regression models identified several variables as predictors of red-cockaded woodpecker reproductive success including group density, reduced hardwood component, small nesting season home range size, and shorter foraging distances. Red-cockaded woodpecker reproductive success was correlated with habitat and behavioral characteristics that emphasize high quality habitat. By providing high quality foraging habitat during the nesting season, red-cockaded woodpeckers can successfully reproduce within small home ranges.
Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification.

PubMed

Schisterman, Enrique F; Perkins, Neil J; Mumford, Sunni L; Ahrens, Katherine A; Mitchell, Emily M

2017-01-01

Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
An investigation on fatality of drivers in vehicle-fixed object accidents on expressways in China: Using multinomial logistic regression model.

PubMed

Peng, Yong; Peng, Shuangling; Wang, Xinghua; Tan, Shiyang

2018-06-01

This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha-Zhuzhou-Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle-fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle-fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle-fixed object accidents.
Empirical models based on the universal soil loss equation fail to predict sediment discharges from Chesapeake Bay catchments.

PubMed

Boomer, Kathleen B; Weller, Donald E; Jordan, Thomas E

2008-01-01

The Universal Soil Loss Equation (USLE) and its derivatives are widely used for identifying watersheds with a high potential for degrading stream water quality. We compared sediment yields estimated from regional application of the USLE, the automated revised RUSLE2, and five sediment delivery ratio algorithms to measured annual average sediment delivery in 78 catchments of the Chesapeake Bay watershed. We did the same comparisons for another 23 catchments monitored by the USGS. Predictions exceeded observed sediment yields by more than 100% and were highly correlated with USLE erosion predictions (Pearson r range, 0.73-0.92; p < 0.001). RUSLE2-erosion estimates were highly correlated with USLE estimates (r = 0.87; p < 001), so the method of implementing the USLE model did not change the results. In ranked comparisons between observed and predicted sediment yields, the models failed to identify catchments with higher yields (r range, -0.28-0.00; p > 0.14). In a multiple regression analysis, soil erodibility, log (stream flow), basin shape (topographic relief ratio), the square-root transformed proportion of forest, and occurrence in the Appalachian Plateau province explained 55% of the observed variance in measured suspended sediment loads, but the model performed poorly (r(2) = 0.06) at predicting loads in the 23 USGS watersheds not used in fitting the model. The use of USLE or multiple regression models to predict sediment yields is not advisable despite their present widespread application. Integrated watershed models based on the USLE may also be unsuitable for making management decisions.
Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

PubMed

Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon

2015-01-01

Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.

PubMed

Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne

2012-01-01

The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.
Regression modeling of ground-water flow

USGS Publications Warehouse

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Practical guidance for conducting mediation analysis with multiple mediators using inverse odds ratio weighting.

PubMed

Nguyen, Quynh C; Osypuk, Theresa L; Schmidt, Nicole M; Glymour, M Maria; Tchetgen Tchetgen, Eric J

2015-03-01

Despite the recent flourishing of mediation analysis techniques, many modern approaches are difficult to implement or applicable to only a restricted range of regression models. This report provides practical guidance for implementing a new technique utilizing inverse odds ratio weighting (IORW) to estimate natural direct and indirect effects for mediation analyses. IORW takes advantage of the odds ratio's invariance property and condenses information on the odds ratio for the relationship between the exposure (treatment) and multiple mediators, conditional on covariates, by regressing exposure on mediators and covariates. The inverse of the covariate-adjusted exposure-mediator odds ratio association is used to weight the primary analytical regression of the outcome on treatment. The treatment coefficient in such a weighted regression estimates the natural direct effect of treatment on the outcome, and indirect effects are identified by subtracting direct effects from total effects. Weighting renders treatment and mediators independent, thereby deactivating indirect pathways of the mediators. This new mediation technique accommodates multiple discrete or continuous mediators. IORW is easily implemented and is appropriate for any standard regression model, including quantile regression and survival analysis. An empirical example is given using data from the Moving to Opportunity (1994-2002) experiment, testing whether neighborhood context mediated the effects of a housing voucher program on obesity. Relevant Stata code (StataCorp LP, College Station, Texas) is provided. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Comparison of methods for the prediction of human clearance from hepatocyte intrinsic clearance for a set of reference compounds and an external evaluation set.

PubMed

Yamagata, Tetsuo; Zanelli, Ugo; Gallemann, Dieter; Perrin, Dominique; Dolgos, Hugues; Petersson, Carl

2017-09-01

1. We compared direct scaling, regression model equation and the so-called "Poulin et al." methods to scale clearance (CL) from in vitro intrinsic clearance (CL int ) measured in human hepatocytes using two sets of compounds. One reference set comprised of 20 compounds with known elimination pathways and one external evaluation set based on 17 compounds development in Merck (MS). 2. A 90% prospective confidence interval was calculated using the reference set. This interval was found relevant for the regression equation method. The three outliers identified were justified on the basis of their elimination mechanism. 3. The direct scaling method showed a systematic underestimation of clearance in both the reference and evaluation sets. The "Poulin et al." and the regression equation methods showed no obvious bias in either the reference or evaluation sets. 4. The regression model equation was slightly superior to the "Poulin et al." method in the reference set and showed a better absolute average fold error (AAFE) of value 1.3 compared to 1.6. A larger difference was observed in the evaluation set were the regression method and "Poulin et al." resulted in an AAFE of 1.7 and 2.6, respectively (removing the three compounds with known issues mentioned above). A similar pattern was observed for the correlation coefficient. Based on these data we suggest the regression equation method combined with a prospective confidence interval as the first choice for the extrapolation of human in vivo hepatic metabolic clearance from in vitro systems.
Imaging genetics approach to predict progression of Parkinson's diseases.

PubMed

Mansu Kim; Seong-Jin Son; Hyunjin Park

2017-07-01

Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Depression, stress, and intimate partner violence among Latino migrant and seasonal farmworkers in rural Southeastern North Carolina.

PubMed

Kim-Godwin, Yeoun Soo; Maume, Michael O; Fox, Jane A

2014-12-01

The purpose of the study is to identify the predictors of depression and intimate partner violence (IPV) among Latinos in rural Southeastern North Carolina. A sample of 291 migrant and seasonal farmworkers was interviewed to complete the demographic questionnaire, HITS (intimate violence tendency), Migrant Farmworker Stress Inventory, Center for Epidemiologic Studies Depression Scale (depression), and CAGE/4M (alcohol abuse). OLS regression and structural equation modeling were used to test the hypothesized relations between predictors of IPV and depression. The findings indicated that respondents reporting higher levels of stress also reported higher levels of IPV and depression. The goodness-of-fit statistics for the overall model again indicated a moderate fit of the model to the data (χ2 = 5,612, p < .001; root mean square error for approximation = 0.09; adjusted goodness-of-fit index = 0.44; comparative fit index = 0.52). Although the findings were not robust to estimation in the structural equation models, the OLS regression models indicated direct associations between IPV and depression.

Spatial Bayesian Latent Factor Regression Modeling of Coordinate-based Meta-analysis Data

PubMed Central

Montagna, Silvia; Wager, Tor; Barrett, Lisa Feldman; Johnson, Timothy D.; Nichols, Thomas E.

2017-01-01

Summary Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the paper are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to 1) identify areas of consistent activation; and 2) build a predictive model of task type or cognitive process for new studies (reverse inference). To simultaneously address these aims, we propose a Bayesian point process hierarchical model for CBMA. We model the foci from each study as a doubly stochastic Poisson process, where the study-specific log intensity function is characterised as a linear combination of a high-dimensional basis set. A sparse representation of the intensities is guaranteed through latent factor modeling of the basis coefficients. Within our framework, it is also possible to account for the effect of study-level covariates (meta-regression), significantly expanding the capabilities of the current neuroimaging meta-analysis methods available. We apply our methodology to synthetic data and neuroimaging meta-analysis datasets. PMID:28498564
The use of machine learning for the identification of peripheral artery disease and future mortality risk.

PubMed

Ross, Elsie Gyang; Shah, Nigam H; Dalman, Ronald L; Nead, Kevin T; Cooke, John P; Leeper, Nicholas J

2016-11-01

A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses. Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models. Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates. Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

PubMed

Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

2009-12-01

To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Identification method of laser gyro error model under changing physical field

NASA Astrophysics Data System (ADS)

Wang, Qingqing; Niu, Zhenzhong

2018-04-01

In this paper, the influence mechanism of temperature, temperature changing rate and temperature gradient on the inertial devices is studied. The two-order model of zero bias and the three-order model of the calibration factor of lster gyro under temperature variation are deduced. The calibration scheme of temperature error is designed, and the experiment is carried out. Two methods of stepwise regression analysis and BP neural network are used to identify the parameters of the temperature error model, and the effectiveness of the two methods is proved by the temperature error compensation.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

ERIC Educational Resources Information Center

Haberman, Shelby J.; Sinharay, Sandip

2010-01-01

Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Depression among older Mexican American caregivers.

PubMed

Hernandez, Ann Marie; Bigatti, Silvia M

2010-01-01

The authors compared depression levels between older Mexican American caregivers and noncaregivers while controlling for confounds identified but not controlled in past research. Mexican American caregivers and noncaregivers (N = 114) ages 65 and older were matched on age, gender, socioeconomic status, self-reported health, and acculturation. Caregivers reported higher scores on the Center for Epidemiologic Studies Depression scale (CES-D) and were more likely to score in the depressed range than noncaregivers. In a regression model with all participants, group classification (caregiver vs. noncaregiver) and health significantly predicted CES-D scores. A model with only caregivers that included caregiver burden, self-rated health, and gender significantly predicted CES-D scores, with only caregiver burden entering the regression equation. These results suggest that older Mexican American caregivers are more depressed than noncaregivers, as has been found in younger populations. (c) 2009 APA, all rights reserved.
Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

PubMed

Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

2012-01-01

The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Household water treatment in developing countries: comparing different intervention types using meta-regression.

PubMed

Hunter, Paul R

2009-12-01

Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P < 0.2 level were included in a final regression model. Further analyses were done to estimate the effect of the interventions over time by MonteCarlo modeling using @Risk and the parameter estimates from the final regression model. The overall effect size of all unblinded studies was relative risk = 0.56 (95% confidence intervals 0.51-0.63), but after adjusting for bias due to lack of blinding the effect size was much lower (RR = 0.85, 95% CI = 0.76-0.97). Four main variables were significant predictors of effectiveness of intervention in a multipredictor meta regression model: Log duration of study follow-up (regression coefficient of log effect size = 0.186, standard error (SE) = 0.072), whether or not the study was blinded (coefficient 0.251, SE 0.066) and being conducted in an emergency setting (coefficient -0.351, SE 0.076) were all significant predictors of effect size in the final model. Compared to the ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months ceramic filters were likely to be still effective at reducing disease, whereas SODIS, chlorination, and coagulation-chlorination had little if any benefit. Indeed these three interventions are predicted to have the same or less effect than what may be expected due purely to reporting bias in unblinded studies With the currently available evidence ceramic filters are the most effective form of HWT in the longterm, disinfection-only interventions including SODIS appear to have poor if any longterm public health benefit.
Overloading among crash-involved vehicles in China: identification of factors associated with overloading and crash severity.

PubMed

Zhang, Guangnan; Li, Yanyan; King, Mark J; Zhong, Qiaoting

2018-03-21

Motor vehicle overloading is correlated with the possibility of road crash occurrence and severity. Although overloading of motor vehicles is pervasive in developing nations, few empirical analyses have been performed on factors that might influence the occurrence of overloading. This study aims to address this shortcoming by seeking evidence from several years of crash data from Guangdong province, China. Data on overloading and other factors are extracted for crash-involved vehicles from traffic crash records for 2006-2010 provided by the Traffic Management Bureau in Guangdong province. Logistic regression is applied to identify risk factors for overloading in crash-involved vehicles and within these crashes to identify factors contributing to greater crash severity. Driver, vehicle, road and environmental characteristics and violation types are considered in the regression models. In addition to the basic logistic models, association analysis is employed to identify the potential interactions among different risk factors during fitting the logistic models of overloading and severity. Crash-involved vehicles driven by males from rural households and in an unsafe condition are more likely to be overloaded and to be involved in higher severity overloaded vehicle crashes. If overloaded vehicles speed, the risk of severe traffic crash casualties increases. Young drivers (aged under 25 years) in mountainous areas are more likely to be involved in higher severity overloaded vehicle crashes. This study identifies several factors associated with overloading in crash-involved vehicles and with higher severity overloading crashes and provides an important reference for future research on those specific risk factors. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity.

PubMed

Forno, Erick; Wang, Ting; Yan, Qi; Brehm, John; Acosta-Perez, Edna; Colon-Semidey, Angel; Alvarez, Maria; Boutaoui, Nadia; Cloutier, Michelle M; Alcorn, John F; Canino, Glorisa; Chen, Wei; Celedón, Juan C

2017-10-01

Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.
Identifying the bleeding trauma patient: predictive factors for massive transfusion in an Australasian trauma population.

PubMed

Hsu, Jeremy Ming; Hitos, Kerry; Fletcher, John P

2013-09-01

Military and civilian data would suggest that hemostatic resuscitation results in improved outcomes for exsanguinating patients. However, identification of those patients who are at risk of significant hemorrhage is not clearly defined. We attempted to identify factors that would predict the need for massive transfusion (MT) in an Australasian trauma population, by comparing those trauma patients who did receive massive transfusion with those who did not. Between 1985 and 2010, 1,686 trauma patients receiving at least 1 U of packed red blood cells were identified from our prospectively maintained trauma registry. Demographic, physiologic, laboratory, injury, and outcome variables were reviewed. Univariate analysis determined significant factors between those who received MT and those who did not. A predictive multivariate logistic regression model with backward conditional stepwise elimination was used for MT risk. Statistical analysis was performed using SPSS PASW. MT patients had a higher pulse rate, lower Glasgow Coma Scale (GCS) score, lower systolic blood pressure, lower hemoglobin level, higher Injury Severity Score (ISS), higher international normalized ratio (INR), and longer stay. Initial logistic regression identified base deficit (BD), INR, and hemoperitoneum at laparotomy as independent predictive variables. After assigning cutoff points of BD being greater than 5 and an INR of 1.5 or greater, a further model was created. A BD greater than 5 and either INR of 1.5 or greater or hemoperitoneum was associated with 51 times increase in MT risk (odds ratio, 51.6; 95% confidence interval, 24.9-95.8). The area under the receiver operating characteristic curve for the model was 0.859. From this study, a combination of BD, INR, and hemoperitoneum has demonstrated good predictability for MT. This tool may assist in the determination of those patients who might benefit from hemostatic resuscitation. Prognostic study, level III.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA.

PubMed

Mair, Alan; El-Kadi, Aly I

2013-10-01

Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (>1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach. © 2013 Elsevier B.V. All rights reserved.
Indicators of Dysphagia in Aged Care Facilities.

PubMed

Pu, Dai; Murry, Thomas; Wong, May C M; Yiu, Edwin M L; Chan, Karen M K

2017-09-18

The current cross-sectional study aimed to investigate risk factors for dysphagia in elderly individuals in aged care facilities. A total of 878 individuals from 42 aged care facilities were recruited for this study. The dependent outcome was speech therapist-determined swallowing function. Independent factors were Eating Assessment Tool score, oral motor assessment score, Mini-Mental State Examination, medical history, and various functional status ratings. Binomial logistic regression was used to identify independent variables associated with dysphagia in this cohort. Two statistical models were constructed. Model 1 used variables from case files without the need for hands-on assessment, and Model 2 used variables that could be obtained from hands-on assessment. Variables positively associated with dysphagia identified in Model 1 were male gender, total dependence for activities of daily living, need for feeding assistance, mobility, requiring assistance walking or using a wheelchair, and history of pneumonia. Variables positively associated with dysphagia identified in Model 2 were Mini-Mental State Examination score, edentulousness, and oral motor assessments score. Cognitive function, dentition, and oral motor function are significant indicators associated with the presence of swallowing in the elderly. When assessing the frail elderly, case file information can help clinicians identify frail elderly individuals who may be suffering from dysphagia.
Dental health services utilization and associated factors in children 6 to 12 years old in a low-income country.

PubMed

Medina-Solis, Carlo Eduardo; Maupomé, Gerardo; del Socorro, Herrera Miriam; Pérez-Núñez, Ricardo; Avila-Burgos, Leticia; Lamadrid-Figueroa, Hector

2008-01-01

To determine the factors associated with the dental health services utilization among children ages 6 to 12 in León, Nicaragua. A cross-sectional study was carried out in 1,400 schoolchildren. Using a questionnaire, we determined information related to utilization and independent variables in the previous year. Oral health needs were established by means of a dental examination. To identify the independent variables associated with dental health services utilization, two types of multivariate regression models were used, according to the measurement scale of the outcome variable: a) frequency of utilization as (0) none, (1) one, and (2) two or more, analyzed with the ordered logistic regression and b) the type of service utilized as (0) none, (1) preventive services, (2) curative services, and (3) both services, analyzed with the multinomial logistic regression. The proportion of children who received at least one dental service in the 12 months prior to the study was 27.7 percent. The variables associated with utilization in the two models were older age, female sex, more frequent toothbrushing, positive attitude of the mother toward the child's oral health, higher socioeconomic level, and higher oral health needs. Various predisposing, enabling, and oral health needs variables were associated with higher dental health services utilization. As in prior reports elsewhere, these results from Nicaragua confirmed that utilization inequalities exist between socioeconomic groups. The multinomial logistic regression model evidenced the association of different variables depending on the type of service used.
Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.

PubMed

Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T

2016-12-01

We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright Â© 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

PubMed Central

Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles

2012-01-01

Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072
First-year growth, recruitment, and maturity of walleyes in western Lake Erie

USGS Publications Warehouse

Madenjian, Charles P.; Tyson, Jeffrey T.; Knight, Roger L.; Kershner, Mark W.; Hansen, Michael J.

1996-01-01

In some lakes, first-year growth of walleyes Stizostedion vitreum has been identified as an important factor governing recruitment of juveniles to the adult population. We developed a regression model for walleye recruitment in western Lake Erie by considering factors such as first-year growth, size of the spawning stock, the rate at which the lake warmed during the spring, and abundance of gizzard shad Dorosoma cepedianum. Gizzard shad abundance during the fall prior to spring walleye spawning explained over 40% of the variation in walleye recruitment. Gizzard shad are relatively high in lipids and are preferred prey for walleyes in Lake Erie. Therefore, the high degree of correlation between shad abundance and subsequent walleye recruitment supported the contention that mature females needed adequate lipid reserves during the winter to spawn the following spring. According to the regression analysis, spring warming rate and size of the parental stock also influenced walleye recruitment. Our regression model explained 92% of the variation in recruitment of age-2 fish into the Lake Erie walleye population from 1981 to 1993. The regression model is potentially valuable as a management tool because it could be used to forecast walleye recruitment to the fishery 2 years in advance. First-year growth was poorly correlated with recruitment, which may reflect the unusually low incidence of walleye cannibalism in western Lake Erie. In contrast, first-year growth was strongly linked to age at maturity.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.

PubMed

Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz

2018-01-01

There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Predicting story goodness performance from cognitive measures following traumatic brain injury.

PubMed

Lê, Karen; Coelho, Carl; Mozeiko, Jennifer; Krueger, Frank; Grafman, Jordan

2012-05-01

This study examined the prediction of performance on measures of the Story Goodness Index (SGI; Lê, Coelho, Mozeiko, & Grafman, 2011) from executive function (EF) and memory measures following traumatic brain injury (TBI). It was hypothesized that EF and memory measures would significantly predict SGI outcomes. One hundred sixty-seven individuals with TBI participated in the study. Story retellings were analyzed using the SGI protocol. Three cognitive measures--Delis-Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001) Sorting Test, Wechsler Memory Scale--Third Edition (WMS-III; Wechsler, 1997) Working Memory Primary Index (WMI), and WMS-III Immediate Memory Primary Index (IMI)--were entered into a multiple linear regression model for each discourse measure. Two sets of regression analyses were performed, the first with the Sorting Test as the first predictor and the second with it as the last. The first set of regression analyses identified the Sorting Test and IMI as the only significant predictors of performance on measures of the SGI. The second set identified all measures as significant predictors when evaluating each step of the regression function. The cognitive variables predicted performance on the SGI measures, although there were differences in the amount of explained variance. The results (a) suggest that storytelling ability draws on a number of underlying skills and (b) underscore the importance of using discrete cognitive tasks rather than broad cognitive indices to investigate the cognitive substrates of discourse.
Dynamic linear models using the Kalman filter for early detection and early warning of malaria outbreaks

NASA Astrophysics Data System (ADS)

Merkord, C. L.; Liu, Y.; DeVos, M.; Wimberly, M. C.

2015-12-01

Malaria early detection and early warning systems are important tools for public health decision makers in regions where malaria transmission is seasonal and varies from year to year with fluctuations in rainfall and temperature. Here we present a new data-driven dynamic linear model based on the Kalman filter with time-varying coefficients that are used to identify malaria outbreaks as they occur (early detection) and predict the location and timing of future outbreaks (early warning). We fit linear models of malaria incidence with trend and Fourier form seasonal components using three years of weekly malaria case data from 30 districts in the Amhara Region of Ethiopia. We identified past outbreaks by comparing the modeled prediction envelopes with observed case data. Preliminary results demonstrated the potential for improved accuracy and timeliness over commonly-used methods in which thresholds are based on simpler summary statistics of historical data. Other benefits of the dynamic linear modeling approach include robustness to missing data and the ability to fit models with relatively few years of training data. To predict future outbreaks, we started with the early detection model for each district and added a regression component based on satellite-derived environmental predictor variables including precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and land surface temperature (LST) and spectral indices from the Moderate Resolution Imaging Spectroradiometer (MODIS). We included lagged environmental predictors in the regression component of the model, with lags chosen based on cross-correlation of the one-step-ahead forecast errors from the first model. Our results suggest that predictions of future malaria outbreaks can be improved by incorporating lagged environmental predictors.

Evaluation of scoring models for identifying the need for therapeutic intervention of upper gastrointestinal bleeding: A new prediction score model for Japanese patients.

PubMed

Iino, Chikara; Mikami, Tatsuya; Igarashi, Takasato; Aihara, Tomoyuki; Ishii, Kentaro; Sakamoto, Jyuichi; Tono, Hiroshi; Fukuda, Shinsaku

2016-11-01

Multiple scoring systems have been developed to predict outcomes in patients with upper gastrointestinal bleeding. We determined how well these and a newly established scoring model predict the need for therapeutic intervention, excluding transfusion, in Japanese patients with upper gastrointestinal bleeding. We reviewed data from 212 consecutive patients with upper gastrointestinal bleeding. Patients requiring endoscopic intervention, operation, or interventional radiology were allocated to the therapeutic intervention group. Firstly, we compared areas under the curve for the Glasgow-Blatchford, Clinical Rockall, and AIMS65 scores. Secondly, the scores and factors likely associated with upper gastrointestinal bleeding were analyzed with a logistic regression analysis to form a new scoring model. Thirdly, the new model and the existing model were investigated to evaluate their usefulness. Therapeutic intervention was required in 109 patients (51.4%). The Glasgow-Blatchford score was superior to both the Clinical Rockall and AIMS65 scores for predicting therapeutic intervention need (area under the curve, 0.75 [95% confidence interval, 0.69-0.81] vs 0.53 [0.46-0.61] and 0.52 [0.44-0.60], respectively). Multivariate logistic regression analysis retained seven significant predictors in the model: systolic blood pressure <100 mmHg, syncope, hematemesis, hemoglobin <10 g/dL, blood urea nitrogen ≥22.4 mg/dL, estimated glomerular filtration rate ≤ 60 mL/min per 1.73 m 2 , and antiplatelet medication. Based on these variables, we established a new scoring model with superior discrimination to those of existing scoring systems (area under the curve, 0.85 [0.80-0.90]). We developed a superior scoring model for identifying therapeutic intervention need in Japanese patients with upper gastrointestinal bleeding. © 2016 Japan Gastroenterological Endoscopy Society.
The microcomputer scientific software series 2: general linear model--regression.

Treesearch

Harold M. Rauscher

1983-01-01

The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Finding structure in data using multivariate tree boosting

PubMed Central

Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.

2016-01-01

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

PubMed

Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

2014-07-08

In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital personnel. The identification of specific years and months with increased MRSA rates may be attributable to several hospital level factors including the presence of other pathogens. Within hospitals, the incorporation of the temporal scan statistic to standard surveillance techniques is a valuable tool for healthcare workers to evaluate surveillance strategies and aid in the identification of MRSA clusters.
Predictors of Medicare costs in elderly beneficiaries with breast, colorectal, lung, or prostate cancer.

PubMed

Penberthy, L; Retchin, S M; McDonald, M K; McClish, D K; Desch, C E; Riley, G F; Smith, T J; Hillner, B E; Newschaffer, C J

1999-07-01

Determining the apportionment of costs of cancer care and identifying factors that predict costs are important for planning ethical resource allocation for cancer care, especially in markets where managed care has grown. This study linked tumor registry data with Medicare administrative claims to determine the costs of care for breast, colorectal, lung and prostate cancers during the initial year subsequent to diagnosis, and to develop models to identify factors predicting costs. Patients with a diagnosis of breast (n = 1,952), colorectal (n = 2,563), lung (n = 3,331) or prostate cancer (n = 3,179) diagnosed from 1985 through 1988. The average costs during the initial treatment period were $12,141 (s.d. = $10,434) for breast cancer, $24,910 (s.d. = $14,870) for colorectal cancer, $21,351 (s.d. = $14,813) for lung cancer, and $14,361 (s.d. = $11,216) for prostate cancer. Using least squares regression analysis, factors significantly associated with cost included comorbidity, hospital length of stay, type of therapy, and ZIP level income for all four cancer sites. Access to health care resources was variably associated with costs of care. Total R2 ranged from 38% (prostate) to 49% (breast). The prediction error for the regression models ranged from < 1% to 4%, by cancer site. Linking administrative claims with state tumor registry data can accurately predict costs of cancer care during the first year subsequent to diagnosis for cancer patients. Regression models using both data sources may be useful to health plans and providers and in determining appropriate prospective reimbursement for cancer, particularly with increasing HMO penetration and decreased ability to capture complete and accurate utilization and cost data on this population.
Risk of Recurrence in Operated Parasagittal Meningiomas: A Logistic Binary Regression Model.

PubMed

Escribano Mesa, José Alberto; Alonso Morillejo, Enrique; Parrón Carreño, Tesifón; Huete Allut, Antonio; Narro Donate, José María; Méndez Román, Paddy; Contreras Jiménez, Ascensión; Pedrero García, Francisco; Masegosa González, José

2018-02-01

Parasagittal meningiomas arise from the arachnoid cells of the angle formed between the superior sagittal sinus (SSS) and the brain convexity. In this retrospective study, we focused on factors that predict early recurrence and recurrence times. We reviewed 125 patients with parasagittal meningiomas operated from 1985 to 2014. We studied the following variables: age, sex, location, laterality, histology, surgeons, invasion of the SSS, Simpson removal grade, follow-up time, angiography, embolization, radiotherapy, recurrence and recurrence time, reoperation, neurologic deficit, degree of dependency, and patient status at the end of follow-up. Patients ranged in age from 26 to 81 years (mean 57.86 years; median 60 years). There were 44 men (35.2%) and 81 women (64.8%). There were 57 patients with neurologic deficits (45.2%). The most common presenting symptom was motor deficit. World Health Organization grade I tumors were identified in 104 patients (84.6%), and the majority were the meningothelial type. Recurrence was detected in 34 cases. Time of recurrence was 9 to 336 months (mean: 84.4 months; median: 79.5 months). Male sex was identified as an independent risk for recurrence with relative risk 2.7 (95% confidence interval 1.21-6.15), P = 0.014. Kaplan-Meier curves for recurrence had statistically significant differences depending on sex, age, histologic type, and World Health Organization histologic grade. A binary logistic regression was made with the Hosmer-Lemeshow test with P > 0.05; sex, tumor size, and histologic type were used in this model. Male sex is an independent risk factor for recurrence that, associated with other factors such tumor size and histologic type, explains 74.5% of all cases in a binary regression model. Copyright © 2017 Elsevier Inc. All rights reserved.
Integrative eQTL analysis of tumor and host omics data in individuals with bladder cancer.

PubMed

Pineda, Silvia; Van Steen, Kristel; Malats, Núria

2017-09-01

Integrative analyses of several omics data are emerging. The data are usually generated from the same source material (i.e., tumor sample) representing one level of regulation. However, integrating different regulatory levels (i.e., blood) with those from tumor may also reveal important knowledge about the human genetic architecture. To model this multilevel structure, an integrative-expression quantitative trait loci (eQTL) analysis applying two-stage regression (2SR) was proposed. This approach first regressed tumor gene expression levels with tumor markers and the adjusted residuals from the previous model were then regressed with the germline genotypes measured in blood. Previously, we demonstrated that penalized regression methods in combination with a permutation-based MaxT method (Global-LASSO) is a promising tool to fix some of the challenges that high-throughput omics data analysis imposes. Here, we assessed whether Global-LASSO can also be applied when tumor and blood omics data are integrated. We further compared our strategy with two 2SR-approaches, one using multiple linear regression (2SR-MLR) and other using LASSO (2SR-LASSO). We applied the three models to integrate genomic, epigenomic, and transcriptomic data from tumor tissue with blood germline genotypes from 181 individuals with bladder cancer included in the TCGA Consortium. Global-LASSO provided a larger list of eQTLs than the 2SR methods, identified a previously reported eQTLs in prostate stem cell antigen (PSCA), and provided further clues on the complexity of APBEC3B loci, with a minimal false-positive rate not achieved by 2SR-MLR. It also represents an important contribution for omics integrative analysis because it is easy to apply and adaptable to any type of data. © 2017 WILEY PERIODICALS, INC.
Application of logistic regression for landslide susceptibility zoning of Cekmece Area, Istanbul, Turkey

NASA Astrophysics Data System (ADS)

Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.

2006-11-01

As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Hybrid rocket engine, theoretical model and experiment

NASA Astrophysics Data System (ADS)

Chelaru, Teodor-Viorel; Mingireanu, Florin

2011-06-01

The purpose of this paper is to build a theoretical model for the hybrid rocket engine/motor and to validate it using experimental results. The work approaches the main problems of the hybrid motor: the scalability, the stability/controllability of the operating parameters and the increasing of the solid fuel regression rate. At first, we focus on theoretical models for hybrid rocket motor and compare the results with already available experimental data from various research groups. A primary computation model is presented together with results from a numerical algorithm based on a computational model. We present theoretical predictions for several commercial hybrid rocket motors, having different scales and compare them with experimental measurements of those hybrid rocket motors. Next the paper focuses on tribrid rocket motor concept, which by supplementary liquid fuel injection can improve the thrust controllability. A complementary computation model is also presented to estimate regression rate increase of solid fuel doped with oxidizer. Finally, the stability of the hybrid rocket motor is investigated using Liapunov theory. Stability coefficients obtained are dependent on burning parameters while the stability and command matrixes are identified. The paper presents thoroughly the input data of the model, which ensures the reproducibility of the numerical results by independent researchers.
Active Learning to Understand Infectious Disease Models and Improve Policy Making

PubMed Central

Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

2014-01-01

Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387
Active learning to understand infectious disease models and improve policy making.

PubMed

Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel

2014-04-01

Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Multivariate random-parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections.

PubMed

Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan

2014-09-01

Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.

PubMed

Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A

2015-07-01

Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

PubMed

Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

2017-03-10

To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Evaluation of weighted regression and sample size in developing a taper model for loblolly pine

Treesearch

Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold

1992-01-01

A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

NASA Astrophysics Data System (ADS)

Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

2017-06-01

A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
A comparison of regression methods for model selection in individual-based landscape genetic analysis.

PubMed

Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A

2018-01-01

Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

NASA Astrophysics Data System (ADS)

Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

2016-03-01

How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Prognostic model for survival in patients with early stage cervical cancer.

PubMed

Biewenga, Petra; van der Velden, Jacobus; Mol, Ben Willem J; Stalpers, Lukas J A; Schilthuis, Marten S; van der Steeg, Jan Willem; Burger, Matthé P M; Buist, Marrije R

2011-02-15

In the management of early stage cervical cancer, knowledge about the prognosis is critical. Although many factors have an impact on survival, their relative importance remains controversial. This study aims to develop a prognostic model for survival in early stage cervical cancer patients and to reconsider grounds for adjuvant treatment. A multivariate Cox regression model was used to identify the prognostic weight of clinical and histological factors for disease-specific survival (DSS) in 710 consecutive patients who had surgery for early stage cervical cancer (FIGO [International Federation of Gynecology and Obstetrics] stage IA2-IIA). Prognostic scores were derived by converting the regression coefficients for each prognostic marker and used in a score chart. The discriminative capacity was expressed as the area under the curve (AUC) of the receiver operating characteristic. The 5-year DSS was 92%. Tumor diameter, histological type, lymph node metastasis, depth of stromal invasion, lymph vascular space invasion, and parametrial extension were independently associated with DSS and were included in a Cox regression model. This prognostic model, corrected for the 9% overfit shown by internal validation, showed a fair discriminative capacity (AUC, 0.73). The derived score chart predicting 5-year DSS showed a good discriminative capacity (AUC, 0.85). In patients with early stage cervical cancer, DSS can be predicted with a statistical model. Models, such as that presented here, should be used in clinical trials on the effects of adjuvant treatments in high-risk early cervical cancer patients, both to stratify and to include patients. Copyright © 2010 American Cancer Society.
The effect of service satisfaction and spiritual well-being on the quality of life of patients with schizophrenia.

PubMed

Lanfredi, Mariangela; Candini, Valentina; Buizza, Chiara; Ferrari, Clarissa; Boero, Maria E; Giobbio, Gian M; Goldschmidt, Nicoletta; Greppo, Stefania; Iozzino, Laura; Maggi, Paolo; Melegari, Anna; Pasqualetti, Patrizio; Rossi, Giuseppe; de Girolamo, Giovanni

2014-05-15

Quality of life (QOL) has been considered an important outcome measure in psychiatric research and determinants of QOL have been widely investigated. We aimed at detecting predictors of QOL at baseline and at testing the longitudinal interrelations of the baseline predictors with QOL scores at a 1-year follow-up in a sample of patients living in Residential Facilities (RFs). Logistic regression models were adopted to evaluate the association between WHOQoL-Bref scores and potential determinants of QOL. In addition, all variables significantly associated with QOL domains in the final logistic regression model were included by using the Structural Equation Modeling (SEM). We included 139 patients with a diagnosis of schizophrenia spectrum. In the final logistic regression model level of activity, social support, age, service satisfaction, spiritual well-being and symptoms' severity were identified as predictors of QOL scores at baseline. Longitudinal analyses carried out by SEM showed that 40% of QOL follow-up variability was explained by QOL at baseline, and significant indirect effects toward QOL at follow-up were found for satisfaction with services and for social support. Rehabilitation plans for people with schizophrenia living in RFs should also consider mediators of change in subjective QOL such as satisfaction with mental health services. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

Breast Arterial Calcification Is Associated with Reproductive Factors in Asymptomatic Postmenopausal Women

PubMed Central

Whaley, Dana H.; Sheedy, Patrick F.; Peyser, Patricia A.

2010-01-01

Abstract Objective The etiology of breast arterial calcification (BAC) is not well understood. We examined reproductive history and cardiovascular disease (CVD) risk factor associations with the presence of detectable BAC in asymptomatic postmenopausal women. Methods Reproductive history and CVD risk factors were obtained in 240 asymptomatic postmenopausal women from a community-based research study who had a screening mammogram within 2 years of their participation in the study. The mammograms were reviewed for the presence of detectable BAC. Age-adjusted logistic regression models were fit to assess the association between each risk factor and the presence of BAC. Multiple variable logistic regression models were used to identify the most parsimonious model for the presence of BAC. Results The prevalence of BAC increased with increased age (p < 0.0001). The most parsimonious logistic regression model for BAC presence included age at time of examination, increased parity (p = 0.01), earlier age at first birth (p = 0.002), weight, and an age-by-weight interaction term (p = 0.004). Older women with a smaller body size had a higher probability of having BAC than women of the same age with a larger body size. Conclusions The presence or absence of BAC at mammography may provide an assessment of a postmenopausal woman's lifetime estrogen exposure and indicate women who could be at risk for hormonally related conditions. PMID:20629578
Modified locally weighted--partial least squares regression improving clinical predictions from infrared spectra of human serum samples.

PubMed

Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la

2013-03-30

Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

NASA Astrophysics Data System (ADS)

Prahutama, Alan; Suparti; Wahyu Utami, Tiani

2018-03-01

Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Estimating the impact of mineral aerosols on crop yields in food insecure regions using statistical crop models

NASA Astrophysics Data System (ADS)

Hoffman, A.; Forest, C. E.; Kemanian, A.

2016-12-01

A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

PubMed

Lin, Lei; Wang, Qian; Sadek, Adel W

2016-06-01

The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Associations between dairy cow inter-service interval and probability of conception.

PubMed

Remnant, J G; Green, M J; Huxley, J N; Hudson, C D

2018-07-01

Recent research has indicated that the interval between inseminations in modern dairy cattle is often longer than the commonly accepted cycle length of 18-24 days. This study analysed 257,396 inseminations in 75,745 cows from 312 herds in England and Wales. The interval between subsequent inseminations in the same cow in the same lactation (inter-service interval, ISI) were calculated and inseminations categorised as successful or unsuccessful depending on whether there was a corresponding calving event. Conception risk was calculated for each individual ISI between 16 and 28 days. A random effects logistic regression model was fitted to the data with pregnancy as the outcome variable and ISI (in days) included in the model as a categorical variable. The modal ISI was 22 days and the peak conception risk was 44% for ISIs of 21 days rising from 27% at 16 days. The logistic regression model revealed significant associations of conception risk with ISI as well as 305 day milk yield, insemination number, parity and days in milk. Predicted conception risk was lower for ISIs of 16, 17 and 18 days and higher for ISIs of 20, 21 and 22 days compared to 25 day ISIs. A mixture model was specified to identify clusters in insemination frequency and conception risk for ISIs between 3 and 50 days. A "high conception risk, high insemination frequency" cluster was identified between 19 and 26 days which indicated that this time period was the true latent distribution for ISI with optimal reproductive outcome. These findings suggest that the period of increased numbers of inseminations around 22 days identified in existing work coincides with the period of increased probability of conception and therefore likely represents true return estrus events. Copyright © 2018 Elsevier Inc. All rights reserved.
Modelling seasonal variations in presentations at a paediatric emergency department.

PubMed

Takase, Miyuki; Carlin, John

2012-09-01

Overcrowding is a phenomenon commonly observed at emergency departments (EDs) in many hospitals, and negatively impacts patients, healthcare professionals and organisations. Health care organisations are expected to act proactively to cope with a high patient volume by understanding and predicting the patterns of ED presentations. The aim of this study was, therefore, to identify the patterns of patient flow at a paediatric ED in order to assist the management of EDs. Data for ED presentations were collected from the Royal Children's Hospital in Melbourne, Australia, with the time-frame of July 2003 to June 2008. A linear regression analysis with trigonometric functions was used to identify the pattern of patient flow at the ED. The results showed that a logarithm of the daily average ED presentations was increasing exponentially (as explained by 0.004t + 0.00005t2 with t representing time, p<0.001). The model also indicated that there was a yearly oscillation in the frequency of ED presentations, in which lower frequencies were observed in summer and higher frequencies during winter (as explained by -0.046 sin(2(pi)t/12)-0.083 cos(2(pi)t/12), p<0.001). In addition, the variation of the oscillations was increasing over time (as explained by -0.002t*sin(2(pi)t/12)-0.001t*cos(2(pi)t/12), p<0.05). The identified regression model explained a total of 96% of the variance in the pattern of ED presentations. This model can be used to understand the trend of the current patient flow as well as to predict the future flow at the ED. Such an understanding will assist health care managers to prepare resources and environment more effectively to cope with overcrowding.
Elucidation of chemosensitization effect of acridones in cancer cell lines: Combined pharmacophore modeling, 3D QSAR, and molecular dynamics studies.

PubMed

Gade, Deepak Reddy; Makkapati, Amareswararao; Yarlagadda, Rajesh Babu; Peters, Godefridus J; Sastry, B S; Rajendra Prasad, V V S

2018-06-01

Overexpression of P-glycoprotein (P-gp) leads to the emergence of multidrug resistance (MDR) in cancer treatment. Acridones have the potential to reverse MDR and sensitize cells. In the present study, we aimed to elucidate the chemosensitization potential of acridones by employing various molecular modelling techniques. Pharmacophore modeling was performed for the dataset of chemosensitizing acridones earlier proved for cytotoxic activity against MCF7 breast cancer cell line. Gaussian-based QSAR studies also performed to predict the favored and disfavored region of the acridone molecules. Molecular dynamics simulations were performed for compound 10 and human P-glycoprotein (obtained from Homology modeling). An efficient pharmacophore containing 2 hydrogen bond acceptors and 3 aromatic rings (AARRR.14) was identified. NCI 2012 chemical database was screened against AARRR.14 CPH and identified 25 best-fit molecules. Potential regions of the compound were identified through Field (Gaussian) based QSAR. Regression analysis of atom-based QSAR resulted in r 2 of 0.95 and q 2 of 0.72, whereas, regression analysis of field-based QSAR resulted in r 2 of 0.92 and q 2 of 0.87 along with r 2 cv as 0.71. The fate of the acridone molecule (compound 10) in the P-glycoprotein environment is analyzed through analyzing the conformational changes occurring during the molecular dynamics simulations. Combined data of different in silico techniques provided basis for deeper understanding of structural and mechanistic insights of interaction phenomenon of acridones with P-glycoprotein and also as strategic basis for designing more potent molecules for anti-cancer and multidrug resistance reversal activities. Copyright © 2018 Elsevier Ltd. All rights reserved.
C-Reactive Protein and Prediction of 1-Year Mortality in Prevalent Hemodialysis Patients

PubMed Central

Bazeley, Jonathan; Bieber, Brian; Li, Yun; Morgenstern, Hal; de Sequera, Patricia; Combe, Christian; Yamamoto, Hiroyasu; Gallagher, Martin; Port, Friedrich K.

2011-01-01

Summary Background and objectives Measurement of C-reactive protein (CRP) levels remains uncommon in North America, although it is now routine in many countries. Using Dialysis Outcomes and Practice Patterns Study data, our primary aim was to evaluate the value of CRP for predicting mortality when measured along with other common inflammatory biomarkers. Design, setting, participants, & measurements We studied 5061 prevalent hemodialysis patients from 2005 to 2008 in 140 facilities routinely measuring CRP in 10 countries. The association of CRP with mortality was evaluated using Cox regression. Prediction of 1-year mortality was assessed in logistic regression models with differing adjustment variables. Results Median baseline CRP was lower in Japan (1.0 mg/L) than other countries (6.0 mg/L). CRP was positively, monotonically associated with mortality. No threshold below which mortality rate leveled off was identified. In prediction models, CRP performance was comparable with albumin and exceeded ferritin and white blood cell (WBC) count based on measures of model discrimination (c-statistics, net reclassification improvement [NRI]) and global model fit (generalized R2). The primary analysis included age, gender, diabetes, catheter use, and the four inflammatory markers (omitting one at a time). Specifying NRI ≥5% as appropriate reclassification of predicted mortality risk, NRI for CRP was 12.8% compared with 10.3% for albumin, 0.8% for ferritin, and <0.1% for WBC. Conclusions These findings demonstrate the value of measuring CRP in addition to standard inflammatory biomarkers to improve mortality prediction in hemodialysis patients. Future studies are indicated to identify interventions that lower CRP and to identify whether they improve clinical outcomes. PMID:21868617
Identifying and quantifying secondhand smoke in multiunit homes with tobacco smoke odor complaints

NASA Astrophysics Data System (ADS)

Dacunto, Philip J.; Cheng, Kai-Chung; Acevedo-Bolton, Viviana; Klepeis, Neil E.; Repace, James L.; Ott, Wayne R.; Hildemann, Lynn M.

2013-06-01

Accurate identification and quantification of the secondhand tobacco smoke (SHS) that drifts between multiunit homes (MUHs) is essential for assessing resident exposure and health risk. We collected 24 gaseous and particle measurements over 6-9 day monitoring periods in five nonsmoking MUHs with reported SHS intrusion problems. Nicotine tracer sampling showed evidence of SHS intrusion in all five homes during the monitoring period; logistic regression and chemical mass balance (CMB) analysis enabled identification and quantification of some of the precise periods of SHS entry. Logistic regression models identified SHS in eight periods when residents complained of SHS odor, and CMB provided estimates of SHS magnitude in six of these eight periods. Both approaches properly identified or apportioned all six cooking periods used as no-SHS controls. Finally, both approaches enabled identification and/or apportionment of suspected SHS in five additional periods when residents did not report smelling smoke. The time resolution of this methodology goes beyond sampling methods involving single tracers (such as nicotine), enabling the precise identification of the magnitude and duration of SHS intrusion, which is essential for accurate assessment of human exposure.
Development and validation of a predictive model for excessive postpartum blood loss: A retrospective, cohort study.

PubMed

Rubio-Álvarez, Ana; Molina-Alarcón, Milagros; Arias-Arias, Ángel; Hernández-Martínez, Antonio

2018-03-01

postpartum haemorrhage is one of the leading causes of maternal morbidity and mortality worldwide. Despite the use of uterotonics agents as preventive measure, it remains a challenge to identify those women who are at increased risk of postpartum bleeding. to develop and to validate a predictive model to assess the risk of excessive bleeding in women with vaginal birth. retrospective cohorts study. "Mancha-Centro Hospital" (Spain). the elaboration of the predictive model was based on a derivation cohort consisting of 2336 women between 2009 and 2011. For validation purposes, a prospective cohort of 953 women between 2013 and 2014 were employed. Women with antenatal fetal demise, multiple pregnancies and gestations under 35 weeks were excluded METHODS: we used a multivariate analysis with binary logistic regression, Ridge Regression and areas under the Receiver Operating Characteristic curves to determine the predictive ability of the proposed model. there was 197 (8.43%) women with excessive bleeding in the derivation cohort and 63 (6.61%) women in the validation cohort. Predictive factors in the final model were: maternal age, primiparity, duration of the first and second stages of labour, neonatal birth weight and antepartum haemoglobin levels. Accordingly, the predictive ability of this model in the derivation cohort was 0.90 (95% CI: 0.85-0.93), while it remained 0.83 (95% CI: 0.74-0.92) in the validation cohort. this predictive model is proved to have an excellent predictive ability in the derivation cohort, and its validation in a latter population equally shows a good ability for prediction. This model can be employed to identify women with a higher risk of postpartum haemorrhage. Copyright © 2017 Elsevier Ltd. All rights reserved.
Serum magnesium but not calcium was associated with hemorrhagic transformation in stroke overall and stroke subtypes: a case-control study in China.

PubMed

Tan, Ge; Yuan, Ruozhen; Wei, ChenChen; Xu, Mangmang; Liu, Ming

2018-05-26

Association between serum calcium and magnesium versus hemorrhagic transformation (HT) remains to be identified. A total of 1212 non-thrombolysis patients with serum calcium and magnesium collected within 24 h from stroke onset were enrolled. Backward stepwise multivariate logistic regression analysis was conducted to investigate association between calcium and magnesium versus HT. Calcium and magnesium were entered into logistic regression analysis in two models, separately: model 1, as continuous variable (per 1-mmol/L increase), and model 2, as four-categorized variable (being collapsed into quartiles). HT occurred in 140 patients (11.6%). Serum calcium was slightly lower in patients with HT than in patient without HT (P = 0.273). But serum magnesium was significantly lower in patients with HT than in patients without HT (P = 0.007). In logistic regression analysis, calcium displayed no association with HT. Magnesium, as either continuous or four-categorized variable, was independently and inversely associated with HT in stroke overall and stroke of large-artery atherosclerosis (LAA). The results demonstrated that serum calcium had no association with HT in patients without thrombolysis after acute ischemic stroke. Serum magnesium in low level was independently associated with increasing HT in stroke overall and particularly in stroke of LAA.
Water Quality Variable Estimation using Partial Least Squares Regression and Multi-Scale Remote Sensing.

NASA Astrophysics Data System (ADS)

Peterson, K. T.; Wulamu, A.

2017-12-01

Water, essential to all living organisms, is one of the Earth's most precious resources. Remote sensing offers an ideal approach to monitor water quality over traditional in-situ techniques that are highly time and resource consuming. Utilizing a multi-scale approach, incorporating data from handheld spectroscopy, UAS based hyperspectal, and satellite multispectral images were collected in coordination with in-situ water quality samples for the two midwestern watersheds. The remote sensing data was modeled and correlated to the in-situ water quality variables including chlorophyll content (Chl), turbidity, and total dissolved solids (TDS) using Normalized Difference Spectral Indices (NDSI) and Partial Least Squares Regression (PLSR). The results of the study supported the original hypothesis that correlating water quality variables with remotely sensed data benefits greatly from the use of more complex modeling and regression techniques such as PLSR. The final results generated from the PLSR analysis resulted in much higher R2 values for all variables when compared to NDSI. The combination of NDSI and PLSR analysis also identified key wavelengths for identification that aligned with previous study's findings. This research displays the advantages and future for complex modeling and machine learning techniques to improve water quality variable estimation from spectral data.
Spectroscopic Determination of Aboveground Biomass in Grasslands Using Spectral Transformations, Support Vector Machine and Partial Least Squares Regression

PubMed Central

Marabel, Miguel; Alvarez-Taboada, Flor

2013-01-01

Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082
Safety analysis of urban signalized intersections under mixed traffic.

PubMed

S, Anjana; M V L R, Anjaneyulu

2015-02-01

This study examined the crash causative factors of signalized intersections under mixed traffic using advanced statistical models. Hierarchical Poisson regression and logistic regression models were developed to predict the crash frequency and severity of signalized intersection approaches. The prediction models helped to develop general safety countermeasures for signalized intersections. The study shows that exclusive left turn lanes and countdown timers are beneficial for improving the safety of signalized intersections. Safety is also influenced by the presence of a surveillance camera, green time, median width, traffic volume, and proportion of two wheelers in the traffic stream. The factors that influence the severity of crashes were also identified in this study. As a practical application, the safe values of deviation of green time provided from design green time, with varying traffic volume, is presented in this study. This is a useful tool for setting the appropriate green time for a signalized intersection approach with variations in the traffic volume. Copyright © 2014 Elsevier Ltd. All rights reserved.
Dirichlet Component Regression and its Applications to Psychiatric Data

PubMed Central

Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel

2011-01-01

Summary We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook’s distance, and a local jackknife influence metric. PMID:22058582
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis

PubMed Central

Gong, Xiajing; Hu, Meng

2018-01-01

Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640
Methods for estimating drought streamflow probabilities for Virginia streams

USGS Publications Warehouse

Austin, Samuel H.

2014-01-01

Maximum likelihood logistic regression model equations used to estimate drought flow probabilities for Virginia streams are presented for 259 hydrologic basins in Virginia. Winter streamflows were used to estimate the likelihood of streamflows during the subsequent drought-prone summer months. The maximum likelihood logistic regression models identify probable streamflows from 5 to 8 months in advance. More than 5 million streamflow daily values collected over the period of record (January 1, 1900 through May 16, 2012) were compiled and analyzed over a minimum 10-year (maximum 112-year) period of record. The analysis yielded the 46,704 equations with statistically significant fit statistics and parameter ranges published in two tables in this report. These model equations produce summer month (July, August, and September) drought flow threshold probabilities as a function of streamflows during the previous winter months (November, December, January, and February). Example calculations are provided, demonstrating how to use the equations to estimate probable streamflows as much as 8 months in advance.
An artificial neural network prediction model of congenital heart disease based on risk factors: A hospital-based case-control study.

PubMed

Li, Huixia; Luo, Miyang; Zheng, Jianfei; Luo, Jiayou; Zeng, Rong; Feng, Na; Du, Qiyun; Fang, Junqun

2017-02-01

An artificial neural network (ANN) model was developed to predict the risks of congenital heart disease (CHD) in pregnant women.This hospital-based case-control study involved 119 CHD cases and 239 controls all recruited from birth defect surveillance hospitals in Hunan Province between July 2013 and June 2014. All subjects were interviewed face-to-face to fill in a questionnaire that covered 36 CHD-related variables. The 358 subjects were randomly divided into a training set and a testing set at the ratio of 85:15. The training set was used to identify the significant predictors of CHD by univariate logistic regression analyses and develop a standard feed-forward back-propagation neural network (BPNN) model for the prediction of CHD. The testing set was used to test and evaluate the performance of the ANN model. Univariate logistic regression analyses were performed on SPSS 18.0. The ANN models were developed on Matlab 7.1.The univariate logistic regression identified 15 predictors that were significantly associated with CHD, including education level (odds ratio = 0.55), gravidity (1.95), parity (2.01), history of abnormal reproduction (2.49), family history of CHD (5.23), maternal chronic disease (4.19), maternal upper respiratory tract infection (2.08), environmental pollution around maternal dwelling place (3.63), maternal exposure to occupational hazards (3.53), maternal mental stress (2.48), paternal chronic disease (4.87), paternal exposure to occupational hazards (2.51), intake of vegetable/fruit (0.45), intake of fish/shrimp/meat/egg (0.59), and intake of milk/soymilk (0.55). After many trials, we selected a 3-layer BPNN model with 15, 12, and 1 neuron in the input, hidden, and output layers, respectively, as the best prediction model. The prediction model has accuracies of 0.91 and 0.86 on the training and testing sets, respectively. The sensitivity, specificity, and Yuden Index on the testing set (training set) are 0.78 (0.83), 0.90 (0.95), and 0.68 (0.78), respectively. The areas under the receiver operating curve on the testing and training sets are 0.87 and 0.97, respectively.This study suggests that the BPNN model could be used to predict the risk of CHD in individuals. This model should be further improved by large-sample-size research.
Spatio-temporal variations of nitric acid total columns from 9 years of IASI measurements - a driver study

NASA Astrophysics Data System (ADS)

Ronsmans, Gaétane; Wespes, Catherine; Hurtmans, Daniel; Clerbaux, Cathy; Coheur, Pierre-François

2018-04-01

This study aims to understand the spatial and temporal variability of HNO3 total columns in terms of explanatory variables. To achieve this, multiple linear regressions are used to fit satellite-derived time series of HNO3 daily averaged total columns. First, an analysis of the IASI 9-year time series (2008-2016) is conducted based on various equivalent latitude bands. The strong and systematic denitrification of the southern polar stratosphere is observed very clearly. It is also possible to distinguish, within the polar vortex, three regions which are differently affected by the denitrification. Three exceptional denitrification episodes in 2011, 2014 and 2016 are also observed in the Northern Hemisphere, due to unusually low arctic temperatures. The time series are then fitted by multivariate regressions to identify what variables are responsible for HNO3 variability in global distributions and time series, and to quantify their respective influence. Out of an ensemble of proxies (annual cycle, solar flux, quasi-biennial oscillation, multivariate ENSO index, Arctic and Antarctic oscillations and volume of polar stratospheric clouds), only the those defined as significant (p value < 0.05) by a selection algorithm are retained for each equivalent latitude band. Overall, the regression gives a good representation of HNO3 variability, with especially good results at high latitudes (60-80 % of the observed variability explained by the model). The regressions show the dominance of annual variability in all latitudinal bands, which is related to specific chemistry and dynamics depending on the latitudes. We find that the polar stratospheric clouds (PSCs) also have a major influence in the polar regions, and that their inclusion in the model improves the correlation coefficients and the residuals. However, there is still a relatively large portion of HNO3 variability that remains unexplained by the model, especially in the intertropical regions, where factors not included in the regression model (such as vegetation fires or lightning) may be at play.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.