Zhu, Yu; Xia, Jie-lai; Wang, Jing
2009-09-01
Application of the 'single auto regressive integrated moving average (ARIMA) model' and the 'ARIMA-generalized regression neural network (GRNN) combination model' in the research of the incidence of scarlet fever. Establish the auto regressive integrated moving average model based on the data of the monthly incidence on scarlet fever of one city, from 2000 to 2006. The fitting values of the ARIMA model was used as input of the GRNN, and the actual values were used as output of the GRNN. After training the GRNN, the effect of the single ARIMA model and the ARIMA-GRNN combination model was then compared. The mean error rate (MER) of the single ARIMA model and the ARIMA-GRNN combination model were 31.6%, 28.7% respectively and the determination coefficient (R(2)) of the two models were 0.801, 0.872 respectively. The fitting efficacy of the ARIMA-GRNN combination model was better than the single ARIMA, which had practical value in the research on time series data such as the incidence of scarlet fever.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
A statistical method for predicting seizure onset zones from human single-neuron recordings
NASA Astrophysics Data System (ADS)
Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.
2013-02-01
Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.
Models for predicting the mass of lime fruits by some engineering properties.
Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein
2014-11-01
Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Von Guerard, Paul; Weiss, W.B.
1995-01-01
The U.S. Environmental Protection Agency requires that municipalities that have a population of 100,000 or greater obtain National Pollutant Discharge Elimination System permits to characterize the quality of their storm runoff. In 1992, the U.S. Geological Survey, in cooperation with the Colorado Springs City Engineering Division, began a study to characterize the water quality of storm runoff and to evaluate procedures for the estimation of storm-runoff loads, volume and event-mean concentrations for selected properties and constituents. Precipitation, streamflow, and water-quality data were collected during 1992 at five sites in Colorado Springs. Thirty-five samples were collected, seven at each of the five sites. At each site, three samples were collected for permitting purposes; two of the samples were collected during rainfall runoff, and one sample was collected during snowmelt runoff. Four additional samples were collected at each site to obtain a large enough sample size to estimate storm-runoff loads, volume, and event-mean concentrations for selected properties and constituents using linear-regression procedures developed using data from the Nationwide Urban Runoff Program (NURP). Storm-water samples were analyzed for as many as 186 properties and constituents. The constituents measured include total-recoverable metals, vola-tile-organic compounds, acid-base/neutral organic compounds, and pesticides. Storm runoff sampled had large concentrations of chemical oxygen demand and 5-day biochemical oxygen demand. Chemical oxygen demand ranged from 100 to 830 milligrams per liter, and 5.-day biochemical oxygen demand ranged from 14 to 260 milligrams per liter. Total-organic carbon concentrations ranged from 18 to 240 milligrams per liter. The total-recoverable metals lead and zinc had the largest concentrations of the total-recoverable metals analyzed. Concentrations of lead ranged from 23 to 350 micrograms per liter, and concentrations of zinc ranged from 110 to 1,400 micrograms per liter. The data for 30 storms representing rainfall runoff from 5 drainage basins were used to develop single-storm local-regression models. The response variables, storm-runoff loads, volume, and event-mean concentrations were modeled using explanatory variables for climatic, physical, and land-use characteristics. The r2 for models that use ordinary least-squares regression ranged from 0.57 to 0.86 for storm-runoff loads and volume and from 0.25 to 0.63 for storm-runoff event-mean concentrations. Except for cadmium, standard errors of estimate ranged from 43 to 115 percent for storm- runoff loads and volume and from 35 to 66 percent for storm-runoff event-mean concentrations. Eleven of the 30 concentrations collected during rainfall runoff for total-recoverable cadmium were censored (less than) concentrations. Ordinary least-squares regression should not be used with censored data; however, censored data can be included with uncensored data using tobit regression. Standard errors of estimate for storm-runoff load and event-mean concentration for total-recoverable cadmium, computed using tobit regression, are 247 and 171 percent. Estimates from single-storm regional-regression models, developed from the Nationwide Urban Runoff Program data base, were compared with observed storm-runoff loads, volume, and event-mean concentrations determined from samples collected in the study area. Single-storm regional-regression models tended to overestimate storm-runoff loads, volume, and event-mean con-centrations. Therefore, single-storm local- and regional-regression models were combined using model-adjustment procedures to take advantage of the strengths of both models while minimizing the deficiencies of each model. Procedures were used to develop single-stormregression equations that were adjusted using local data and estimates from single-storm regional-regression equations. Single-storm regression models developed using model- adjustment proce
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Evaluation and application of regional turbidity-sediment regression models in Virginia
Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James S.; Chanat, Jeffrey G.
2015-01-01
Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.
Various approaches and tools exist to estimate local and regional PM2.5 impacts from a single emissions source, ranging from simple screening techniques to Gaussian based dispersion models and complex grid-based Eulerian photochemical transport models. These approache...
ERIC Educational Resources Information Center
Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.
2011-01-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Using Generalized Additive Models to Analyze Single-Case Designs
ERIC Educational Resources Information Center
Shadish, William; Sullivan, Kristynn
2013-01-01
Many analyses for single-case designs (SCDs)--including nearly all the effect size indicators-- currently assume no trend in the data. Regression and multilevel models allow for trend, but usually test only linear trend and have no principled way of knowing if higher order trends should be represented in the model. This paper shows how Generalized…
NASA Astrophysics Data System (ADS)
Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya
2013-03-01
This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.
Li, Yankun; Shao, Xueguang; Cai, Wensheng
2007-04-15
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.
Retargeted Least Squares Regression Algorithm.
Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin
2015-09-01
This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
NASA Astrophysics Data System (ADS)
Salleh, Nur Hanim Mohd; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Saad, Ahmad Ramli; Sulaiman, Husna Mahirah; Ahmad, Wan Muhamad Amir W.
2014-07-01
Polynomial regression is used to model a curvilinear relationship between a response variable and one or more predictor variables. It is a form of a least squares linear regression model that predicts a single response variable by decomposing the predictor variables into an nth order polynomial. In a curvilinear relationship, each curve has a number of extreme points equal to the highest order term in the polynomial. A quadratic model will have either a single maximum or minimum, whereas a cubic model has both a relative maximum and a minimum. This study used quadratic modeling techniques to analyze the effects of environmental factors: temperature, relative humidity, and rainfall distribution on the breeding of Aedes albopictus, a type of Aedes mosquito. Data were collected at an urban area in south-west Penang from September 2010 until January 2011. The results indicated that the breeding of Aedes albopictus in the urban area is influenced by all three environmental characteristics. The number of mosquito eggs is estimated to reach a maximum value at a medium temperature, a medium relative humidity and a high rainfall distribution.
Sacks, Jason D; Ito, Kazuhiko; Wilson, William E; Neas, Lucas M
2012-10-01
With the advent of multicity studies, uniform statistical approaches have been developed to examine air pollution-mortality associations across cities. To assess the sensitivity of the air pollution-mortality association to different model specifications in a single and multipollutant context, the authors applied various regression models developed in previous multicity time-series studies of air pollution and mortality to data from Philadelphia, Pennsylvania (May 1992-September 1995). Single-pollutant analyses used daily cardiovascular mortality, fine particulate matter (particles with an aerodynamic diameter ≤2.5 µm; PM(2.5)), speciated PM(2.5), and gaseous pollutant data, while multipollutant analyses used source factors identified through principal component analysis. In single-pollutant analyses, risk estimates were relatively consistent across models for most PM(2.5) components and gaseous pollutants. However, risk estimates were inconsistent for ozone in all-year and warm-season analyses. Principal component analysis yielded factors with species associated with traffic, crustal material, residual oil, and coal. Risk estimates for these factors exhibited less sensitivity to alternative regression models compared with single-pollutant models. Factors associated with traffic and crustal material showed consistently positive associations in the warm season, while the coal combustion factor showed consistently positive associations in the cold season. Overall, mortality risk estimates examined using a source-oriented approach yielded more stable and precise risk estimates, compared with single-pollutant analyses.
Testing a single regression coefficient in high dimensional linear models
Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2017-01-01
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668
Testing a single regression coefficient in high dimensional linear models.
Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2016-11-01
In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Moeyaert, Mariola; Ugille, Maaike; Ferron, John M; Beretvas, S Natasha; Van den Noortgate, Wim
2014-09-01
The quantitative methods for analyzing single-subject experimental data have expanded during the last decade, including the use of regression models to statistically analyze the data, but still a lot of questions remain. One question is how to specify predictors in a regression model to account for the specifics of the design and estimate the effect size of interest. These quantitative effect sizes are used in retrospective analyses and allow synthesis of single-subject experimental study results which is informative for evidence-based decision making, research and theory building, and policy discussions. We discuss different design matrices that can be used for the most common single-subject experimental designs (SSEDs), namely, the multiple-baseline designs, reversal designs, and alternating treatment designs, and provide empirical illustrations. The purpose of this article is to guide single-subject experimental data analysts interested in analyzing and meta-analyzing SSED data. © The Author(s) 2014.
NASA Technical Reports Server (NTRS)
Ratnayake, Nalin A.; Koshimoto, Ed T.; Taylor, Brian R.
2011-01-01
The problem of parameter estimation on hybrid-wing-body type aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aero- dynamic control effectors that act in coplanar motion. This fact adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of system inputs must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, asymmetric, single-surface maneuvers are used to excite multiple axes of aircraft motion simultaneously. Time history reconstructions of the moment coefficients computed by the solved regression models are then compared to each other in order to assess relative model accuracy. The reduced flight-test time required for inner surface parameter estimation using multi-axis methods was found to come at the cost of slightly reduced accuracy and statistical confidence for linear regression methods. Since the multi-axis maneuvers captured parameter estimates similar to both longitudinal and lateral-directional maneuvers combined, the number of test points required for the inner, aileron-like surfaces could in theory have been reduced by 50%. While trends were similar, however, individual parameters as estimated by a multi-axis model were typically different by an average absolute difference of roughly 15-20%, with decreased statistical significance, than those estimated by a single-axis model. The multi-axis model exhibited an increase in overall fit error of roughly 1-5% for the linear regression estimates with respect to the single-axis model, when applied to flight data designed for each, respectively.
Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth
NASA Astrophysics Data System (ADS)
Olivas Saunders, Rolando
Suspended particulate matter (aerosols) with aerodynamic diameters less than 2.5 mum (PM2.5) has negative effects on human health, plays an important role in climate change and also causes the corrosion of structures by acid deposition. Accurate estimates of PM2.5 concentrations are thus relevant in air quality, epidemiology, cloud microphysics and climate forcing studies. Aerosol optical depth (AOD) retrieved by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument has been used as an empirical predictor to estimate ground-level concentrations of PM2.5 . These estimates usually have large uncertainties and errors. The main objective of this work is to assess the value of using upwind (Lagrangian) MODIS-AOD as predictors in empirical models of PM2.5. The upwind locations of the Lagrangian AOD were estimated using modeled backward air trajectories. Since the specification of an arrival elevation is somewhat arbitrary, trajectories were calculated to arrive at four different elevations at ten measurement sites within the continental United States. A systematic examination revealed trajectory model calculations to be sensitive to starting elevation. With a 500 m difference in starting elevation, the 48-hr mean horizontal separation of trajectory endpoints was 326 km. When the difference in starting elevation was doubled and tripled to 1000 m and 1500m, the mean horizontal separation of trajectory endpoints approximately doubled and tripled to 627 km and 886 km, respectively. A seasonal dependence of this sensitivity was also found: the smallest mean horizontal separation of trajectory endpoints was exhibited during the summer and the largest separations during the winter. A daily average AOD product was generated and coupled to the trajectory model in order to determine AOD values upwind of the measurement sites during the period 2003-2007. Empirical models that included in situ AOD and upwind AOD as predictors of PM2.5 were generated by multivariate linear regressions using the least squares method. The multivariate models showed improved performance over the single variable regression (PM2.5 and in situ AOD) models. The statistical significance of the improvement of the multivariate models over the single variable regression models was tested using the extra sum of squares principle. In many cases, even when the R-squared was high for the multivariate models, the improvement over the single models was not statistically significant. The R-squared of these multivariate models varied with respect to seasons, with the best performance occurring during the summer months. A set of seasonal categorical variables was included in the regressions to exploit this variability. The multivariate regression models that included these categorical seasonal variables performed better than the models that didn't account for seasonal variability. Furthermore, 71% of these regressions exhibited improvement over the single variable models that was statistically significant at a 95% confidence level.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby
2017-01-01
Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.
Development of a Random Field Model for Gas Plume Detection in Multiple LWIR Images.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heasler, Patrick G.
This report develops a random field model that describes gas plumes in LWIR remote sensing images. The random field model serves as a prior distribution that can be combined with LWIR data to produce a posterior that determines the probability that a gas plume exists in the scene and also maps the most probable location of any plume. The random field model is intended to work with a single pixel regression estimator--a regression model that estimates gas concentration on an individual pixel basis.
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.
Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan
2011-11-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
THE DISTRIBUTION OF COOK’S D STATISTIC
Muller, Keith E.; Mok, Mario Chen
2013-01-01
Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. PMID:24363487
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Predictive equations for the estimation of body size in seals and sea lions (Carnivora: Pinnipedia)
Churchill, Morgan; Clementz, Mark T; Kohno, Naoki
2014-01-01
Body size plays an important role in pinniped ecology and life history. However, body size data is often absent for historical, archaeological, and fossil specimens. To estimate the body size of pinnipeds (seals, sea lions, and walruses) for today and the past, we used 14 commonly preserved cranial measurements to develop sets of single variable and multivariate predictive equations for pinniped body mass and total length. Principal components analysis (PCA) was used to test whether separate family specific regressions were more appropriate than single predictive equations for Pinnipedia. The influence of phylogeny was tested with phylogenetic independent contrasts (PIC). The accuracy of these regressions was then assessed using a combination of coefficient of determination, percent prediction error, and standard error of estimation. Three different methods of multivariate analysis were examined: bidirectional stepwise model selection using Akaike information criteria; all-subsets model selection using Bayesian information criteria (BIC); and partial least squares regression. The PCA showed clear discrimination between Otariidae (fur seals and sea lions) and Phocidae (earless seals) for the 14 measurements, indicating the need for family-specific regression equations. The PIC analysis found that phylogeny had a minor influence on relationship between morphological variables and body size. The regressions for total length were more accurate than those for body mass, and equations specific to Otariidae were more accurate than those for Phocidae. Of the three multivariate methods, the all-subsets approach required the fewest number of variables to estimate body size accurately. We then used the single variable predictive equations and the all-subsets approach to estimate the body size of two recently extinct pinniped taxa, the Caribbean monk seal (Monachus tropicalis) and the Japanese sea lion (Zalophus japonicus). Body size estimates using single variable regressions generally under or over-estimated body size; however, the all-subset regression produced body size estimates that were close to historically recorded body length for these two species. This indicates that the all-subset regression equations developed in this study can estimate body size accurately. PMID:24916814
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Body Fat Percentage Prediction Using Intelligent Hybrid Approaches
Shao, Yuehjen E.
2014-01-01
Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone's health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models. PMID:24723804
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...
2016-12-15
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Predicting ecological flow regime at ungaged sites: A comparison of methods
Murphy, Jennifer C.; Knight, Rodney R.; Wolfe, William J.; Gain, W. Scott
2012-01-01
Nineteen ecologically relevant streamflow characteristics were estimated using published rainfall–runoff and regional regression models for six sites with observed daily streamflow records in Kentucky. The regional regression model produced median estimates closer to the observed median for all but two characteristics. The variability of predictions from both models was generally less than the observed variability. The variability of the predictions from the rainfall–runoff model was greater than that from the regional regression model for all but three characteristics. Eight characteristics predicted by the rainfall–runoff model display positive or negative bias across all six sites; biases are not as pronounced for the regional regression model. Results suggest that a rainfall–runoff model calibrated on a single characteristic is less likely to perform well as a predictor of a range of other characteristics (flow regime) when compared with a regional regression model calibrated individually on multiple characteristics used to represent the flow regime. Poor model performance may misrepresent hydrologic conditions, potentially distorting the perceived risk of ecological degradation. Without prior selection of streamflow characteristics, targeted calibration, and error quantification, the widespread application of general hydrologic models to ecological flow studies is problematic. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Lindner-Lunsford, J. B.; Ellis, S.R.
1987-01-01
Multievent, conceptually based models and a single-event, multiple linear-regression model for estimating storm-runoff quantity and quality from urban areas were calibrated and verified for four small (57 to 167 acres) basins in the Denver metropolitan area, Colorado. The basins represented different land-use types - light commercial, single-family housing, and multi-family housing. Both types of models were calibrated using the same data set for each basin. A comparison was made between the storm-runoff volume, peak flow, and storm-runoff loads of seven water quality constituents simulated by each of the models by use of identical verification data sets. The models studied were the U.S. Geological Survey 's Distributed Routing Rainfall-Runoff Model-Version II (DR3M-II) (a runoff-quantity model designed for urban areas), and a multievent urban runoff quality model (DR3M-QUAL). Water quality constituents modeled were chemical oxygen demand, total suspended solids, total nitrogen, total phosphorus, total lead, total manganese, and total zinc. (USGS)
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context
Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan
2012-01-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso. PMID:22347720
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
Du, Qing-Yun; Wang, En-Yin; Huang, Yan; Guo, Xiao-Yi; Xiong, Yu-Jing; Yu, Yi-Ping; Yao, Gui-Dong; Shi, Sen-Lin; Sun, Ying-Pu
2016-04-01
To evaluate the independent effects of the degree of blastocoele expansion and re-expansion and the inner cell mass (ICM) and trophectoderm (TE) grades on predicting live birth after fresh and vitrified/warmed single blastocyst transfer. Retrospective study. Reproductive medical center. Women undergoing 844 fresh and 370 vitrified/warmed single blastocyst transfer cycles. None. Live-birth rate correlated with blastocyst morphology parameters by logistic regression analysis and Spearman correlations analysis. The degree of blastocoele expansion and re-expansion was the only blastocyst morphology parameter that exhibited a significant ability to predict live birth in both fresh and vitrified/warmed single blastocyst transfer cycles respectively by multivariate logistic regression and Spearman correlations analysis. Although the ICM grade was significantly related to live birth in fresh cycles according to the univariate model, its effect was not maintained in the multivariate logistic analysis. In vitrified/warmed cycles, neither ICM nor TE grade was correlated with live birth by logistic regression analysis. This study is the first to confirm that the degree of blastocoele expansion and re-expansion is a better predictor of live birth after both fresh and vitrified/warmed single blastocyst transfer cycles than ICM or TE grade. Copyright © 2016. Published by Elsevier Inc.
78 FR 16808 - Connect America Fund; High-Cost Universal Service Support
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-19
... to use one regression to generate a single cap on total loop costs for each study area. A single cap.... * * * A preferable, and simpler, approach would be to develop one conditional quantile model for aggregate.... Total universal service support for such carriers was approaching $2 billion annually--more than 40...
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.
Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items
ERIC Educational Resources Information Center
Lu, Irene R. R.; Thomas, D. Roland
2008-01-01
This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
Conditional Density Estimation with HMM Based Support Vector Machines
NASA Astrophysics Data System (ADS)
Hu, Fasheng; Liu, Zhenqiu; Jia, Chunxin; Chen, Dechang
Conditional density estimation is very important in financial engineer, risk management, and other engineering computing problem. However, most regression models have a latent assumption that the probability density is a Gaussian distribution, which is not necessarily true in many real life applications. In this paper, we give a framework to estimate or predict the conditional density mixture dynamically. Through combining the Input-Output HMM with SVM regression together and building a SVM model in each state of the HMM, we can estimate a conditional density mixture instead of a single gaussian. With each SVM in each node, this model can be applied for not only regression but classifications as well. We applied this model to denoise the ECG data. The proposed method has the potential to apply to other time series such as stock market return predictions.
NASA Astrophysics Data System (ADS)
Shi, Liangliang; Mao, Zhihua; Wang, Zheng
2018-02-01
Satellite imagery has played an important role in monitoring water quality of lakes or coastal waters presently, but scarcely been applied in inland rivers. This paper presents an attempt of feasibility to apply regression model to quantify and map the concentrations of total suspended matter (CTSM) in inland rivers which have a large scale of spatial and a high CTSM dynamic range by using high resolution satellite remote sensing data, WorldView-2. An empirical approach to quantify CTSM by integrated use of high resolution WorldView-2 multispectral data and 21 in situ CTSM measurements. Radiometric correction, geometric and atmospheric correction involved in image processing procedure is carried out for deriving the surface reflectance to correlate the CTSM and satellite data by using single-variable and multivariable regression technique. Results of regression model show that the single near-infrared (NIR) band 8 of WorldView-2 have a relative strong relationship (R2=0.93) with CTSM. Different prediction models were developed on various combinations of WorldView-2 bands, the Akaike Information Criteria approach was used to choose the best model. The model involving band 1, 3, 5, and 8 of WorldView-2 had a best performance, whose R2 reach to 0.92, with SEE of 53.30 g/m3. The spatial distribution maps were produced by using the best multiple regression model. The results of this paper indicated that it is feasible to apply the empirical model by using high resolution satellite imagery to retrieve CTSM of inland rivers in routine monitoring of water quality.
An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach
NASA Astrophysics Data System (ADS)
Grant, Christina P.
San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.
Syed, Hamzah; Jorgensen, Andrea L; Morris, Andrew P
2016-06-01
To evaluate the power to detect associations between SNPs and time-to-event outcomes across a range of pharmacogenomic study designs while comparing alternative regression approaches. Simulations were conducted to compare Cox proportional hazards modeling accounting for censoring and logistic regression modeling of a dichotomized outcome at the end of the study. The Cox proportional hazards model was demonstrated to be more powerful than the logistic regression analysis. The difference in power between the approaches was highly dependent on the rate of censoring. Initial evaluation of single-nucleotide polymorphism association signals using computationally efficient software with dichotomized outcomes provides an effective screening tool for some design scenarios, and thus has important implications for the development of analytical protocols in pharmacogenomic studies.
USDA-ARS?s Scientific Manuscript database
Validation of model predictions for independent variables not included in model development can save time and money by identifying conditions for which new models are not needed. A single strain of Salmonella Typhimurium DT104 was used to develop a general regression neural network model for growth...
Agogo, George O.; van der Voet, Hilko; Veer, Pieter van’t; Ferrari, Pietro; Leenders, Max; Muller, David C.; Sánchez-Cantalejo, Emilio; Bamia, Christina; Braaten, Tonje; Knüppel, Sven; Johansson, Ingegerd; van Eeuwijk, Fred A.; Boshuizen, Hendriek
2014-01-01
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model. PMID:25402487
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.
Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-04-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
Zheng, Qi; Peng, Limin; He, Xuming
2015-01-01
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424
Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Estimating Infiltration Rates for a Loessal Silt Loam Using Soil Properties
M. Dean Knighton
1978-01-01
Soil properties were related to infiltration rates as measured by single-ringsteady-head infiltometers. The properties showing strong simple correlations were identified. Regression models were developed to estimate infiltration rate from several soil properties. The best model gave fair agreement to measured rates at another location.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
David, Ingrid; Garreau, Hervé; Balmisse, Elodie; Billon, Yvon; Canario, Laurianne
2017-01-20
Some genetic studies need to take into account correlations between traits that are repeatedly measured over time. Multiple-trait random regression models are commonly used to analyze repeated traits but suffer from several major drawbacks. In the present study, we developed a multiple-trait extension of the structured antedependence model (SAD) to overcome this issue and validated its usefulness by modeling the association between litter size (LS) and average birth weight (ABW) over parities in pigs and rabbits. The single-trait SAD model assumes that a random effect at time [Formula: see text] can be explained by the previous values of the random effect (i.e. at previous times). The proposed multiple-trait extension of the SAD model consists in adding a cross-antedependence parameter to the single-trait SAD model. This model can be easily fitted using ASReml and the OWN Fortran program that we have developed. In comparison with the random regression model, we used our multiple-trait SAD model to analyze the LS and ABW of 4345 litters from 1817 Large White sows and 8706 litters from 2286 L-1777 does over a maximum of five successive parities. For both species, the multiple-trait SAD fitted the data better than the random regression model. The difference between AIC of the two models (AIC_random regression-AIC_SAD) were equal to 7 and 227 for pigs and rabbits, respectively. A similar pattern of heritability and correlation estimates was obtained for both species. Heritabilities were lower for LS (ranging from 0.09 to 0.29) than for ABW (ranging from 0.23 to 0.39). The general trend was a decrease of the genetic correlation for a given trait between more distant parities. Estimates of genetic correlations between LS and ABW were negative and ranged from -0.03 to -0.52 across parities. No correlation was observed between the permanent environmental effects, except between the permanent environmental effects of LS and ABW of the same parity, for which the estimate of the correlation was strongly negative (ranging from -0.57 to -0.67). We demonstrated that application of our multiple-trait SAD model is feasible for studying several traits with repeated measurements and showed that it provided a better fit to the data than the random regression model.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
NASA Astrophysics Data System (ADS)
Febrian Umbara, Rian; Tarwidi, Dede; Budi Setiawan, Erwin
2018-03-01
The paper discusses the prediction of Jakarta Composite Index (JCI) in Indonesia Stock Exchange. The study is based on JCI historical data for 1286 days to predict the value of JCI one day ahead. This paper proposes predictions done in two stages., The first stage using Fuzzy Time Series (FTS) to predict values of ten technical indicators, and the second stage using Support Vector Regression (SVR) to predict the value of JCI one day ahead, resulting in a hybrid prediction model FTS-SVR. The performance of this combined prediction model is compared with the performance of the single stage prediction model using SVR only. Ten technical indicators are used as input for each model.
Developing global regression models for metabolite concentration prediction regardless of cell line.
André, Silvère; Lagresle, Sylvain; Da Sliva, Anthony; Heimendinger, Pierre; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic
2017-11-01
Following the Process Analytical Technology (PAT) of the Food and Drug Administration (FDA), drug manufacturers are encouraged to develop innovative techniques in order to monitor and understand their processes in a better way. Within this framework, it has been demonstrated that Raman spectroscopy coupled with chemometric tools allow to predict critical parameters of mammalian cell cultures in-line and in real time. However, the development of robust and predictive regression models clearly requires many batches in order to take into account inter-batch variability and enhance models accuracy. Nevertheless, this heavy procedure has to be repeated for every new line of cell culture involving many resources. This is why we propose in this paper to develop global regression models taking into account different cell lines. Such models are finally transferred to any culture of the cells involved. This article first demonstrates the feasibility of developing regression models, not only for mammalian cell lines (CHO and HeLa cell cultures), but also for insect cell lines (Sf9 cell cultures). Then global regression models are generated, based on CHO cells, HeLa cells, and Sf9 cells. Finally, these models are evaluated considering a fourth cell line(HEK cells). In addition to suitable predictions of glucose and lactate concentration of HEK cell cultures, we expose that by adding a single HEK-cell culture to the calibration set, the predictive ability of the regression models are substantially increased. In this way, we demonstrate that using global models, it is not necessary to consider many cultures of a new cell line in order to obtain accurate models. Biotechnol. Bioeng. 2017;114: 2550-2559. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Brady, Amie M.G.; Plona, Meg B.
2012-01-01
The Cuyahoga River within Cuyahoga Valley National Park (CVNP) is at times impaired for recreational use due to elevated concentrations of Escherichia coli (E. coli), a fecal-indicator bacterium. During the recreational seasons of mid-May through September during 2009–11, samples were collected 4 days per week and analyzed for E. coli concentrations at two sites within CVNP. Other water-quality and environ-mental data, including turbidity, rainfall, and streamflow, were measured and (or) tabulated for analysis. Regression models developed to predict recreational water quality in the river were implemented during the recreational seasons of 2009–11 for one site within CVNP–Jaite. For the 2009 and 2010 seasons, the regression models were better at predicting exceedances of Ohio's single-sample standard for primary-contact recreation compared to the traditional method of using the previous day's E. coli concentration. During 2009, the regression model was based on data collected during 2005 through 2008, excluding available 2004 data. The resulting model for 2009 did not perform as well as expected (based on the calibration data set) and tended to overestimate concentrations (correct responses at 69 percent). During 2010, the regression model was based on data collected during 2004 through 2009, including all of the available data. The 2010 model performed well, correctly predicting 89 percent of the samples above or below the single-sample standard, even though the predictions tended to be lower than actual sample concentrations. During 2011, the regression model was based on data collected during 2004 through 2010 and tended to overestimate concentrations. The 2011 model did not perform as well as the traditional method or as expected, based on the calibration dataset (correct responses at 56 percent). At a second site—Lock 29, approximately 5 river miles upstream from Jaite, a regression model based on data collected at the site during the recreational seasons of 2008–10 also did not perform as well as the traditional method or as well as expected (correct responses at 60 percent). Above normal precipitation in the region and a delayed start to the 2011 sampling season (sampling began mid-June) may have affected how well the 2011 models performed. With these new data, however, updated regression models may be better able to predict recreational water quality conditions due to the increased amount of diverse water quality conditions included in the calibration data. Daily recreational water-quality predictions for Jaite were made available on the Ohio Nowcast Web site at www.ohionowcast.info. Other public outreach included signage at trailheads in the park, articles in the park's quarterly-published schedule of events and volunteer newsletters. A U.S. Geological Survey Fact Sheet was also published to bring attention to water-quality issues in the park.
Gong, Ping; Nan, Xiaofei; Barker, Natalie D; Boyd, Robert E; Chen, Yixin; Wilkins, Dawn E; Johnson, David R; Suedel, Burton C; Perkins, Edward J
2016-03-08
Chemical bioavailability is an important dose metric in environmental risk assessment. Although many approaches have been used to evaluate bioavailability, not a single approach is free from limitations. Previously, we developed a new genomics-based approach that integrated microarray technology and regression modeling for predicting bioavailability (tissue residue) of explosives compounds in exposed earthworms. In the present study, we further compared 18 different regression models and performed variable selection simultaneously with parameter estimation. This refined approach was applied to both previously collected and newly acquired earthworm microarray gene expression datasets for three explosive compounds. Our results demonstrate that a prediction accuracy of R(2) = 0.71-0.82 was achievable at a relatively low model complexity with as few as 3-10 predictor genes per model. These results are much more encouraging than our previous ones. This study has demonstrated that our approach is promising for bioavailability measurement, which warrants further studies of mixed contamination scenarios in field settings.
A quadratic regression modelling on paddy production in the area of Perlis
NASA Astrophysics Data System (ADS)
Goh, Aizat Hanis Annas; Ali, Zalila; Nor, Norlida Mohd; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2017-08-01
Polynomial regression models are useful in situations in which the relationship between a response variable and predictor variables is curvilinear. Polynomial regression fits the nonlinear relationship into a least squares linear regression model by decomposing the predictor variables into a kth order polynomial. The polynomial order determines the number of inflexions on the curvilinear fitted line. A second order polynomial forms a quadratic expression (parabolic curve) with either a single maximum or minimum, a third order polynomial forms a cubic expression with both a relative maximum and a minimum. This study used paddy data in the area of Perlis to model paddy production based on paddy cultivation characteristics and environmental characteristics. The results indicated that a quadratic regression model best fits the data and paddy production is affected by urea fertilizer application and the interaction between amount of average rainfall and percentage of area defected by pest and disease. Urea fertilizer application has a quadratic effect in the model which indicated that if the number of days of urea fertilizer application increased, paddy production is expected to decrease until it achieved a minimum value and paddy production is expected to increase at higher number of days of urea application. The decrease in paddy production with an increased in rainfall is greater, the higher the percentage of area defected by pest and disease.
USDA-ARS?s Scientific Manuscript database
Using linear regression models, we studied the main and two-way interaction effects of the predictor variables gender, age, BMI, and 64 folate/vitamin B-12/homocysteine/lipid/cholesterol-related single nucleotide polymorphisms (SNP) on log-transformed plasma homocysteine normalized by red blood cell...
Maggin, Daniel M; Swaminathan, Hariharan; Rogers, Helen J; O'Keeffe, Breda V; Sugai, George; Horner, Robert H
2011-06-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of treatment effect from baseline to treatment phases in standard deviation units. In this paper, the method is applied to two published examples using common single case designs (i.e., withdrawal and multiple-baseline). The results from these studies are described, and the method is compared to ten desirable criteria for single-case effect sizes. Based on the results of this application, we conclude with observations about the use of GLS as a support to visual analysis, provide recommendations for future research, and describe implications for practice. Copyright © 2011 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Pointwise influence matrices for functional-response regression.
Reiss, Philip T; Huang, Lei; Wu, Pei-Shien; Chen, Huaihou; Colcombe, Stan
2017-12-01
We extend the notion of an influence or hat matrix to regression with functional responses and scalar predictors. For responses depending linearly on a set of predictors, our definition is shown to reduce to the conventional influence matrix for linear models. The pointwise degrees of freedom, the trace of the pointwise influence matrix, are shown to have an adaptivity property that motivates a two-step bivariate smoother for modeling nonlinear dependence on a single predictor. This procedure adapts to varying complexity of the nonlinear model at different locations along the function, and thereby achieves better performance than competing tensor product smoothers in an analysis of the development of white matter microstructure in the brain. © 2017, The International Biometric Society.
Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection
Goldsmith, Jeff; Huang, Lei; Crainiceanu, Ciprian M.
2013-01-01
We develop scalar-on-image regression models when images are registered multidimensional manifolds. We propose a fast and scalable Bayes inferential procedure to estimate the image coefficient. The central idea is the combination of an Ising prior distribution, which controls a latent binary indicator map, and an intrinsic Gaussian Markov random field, which controls the smoothness of the nonzero coefficients. The model is fit using a single-site Gibbs sampler, which allows fitting within minutes for hundreds of subjects with predictor images containing thousands of locations. The code is simple and is provided in less than one page in the Appendix. We apply this method to a neuroimaging study where cognitive outcomes are regressed on measures of white matter microstructure at every voxel of the corpus callosum for hundreds of subjects. PMID:24729670
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S
2015-04-01
Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.
Brito-Rocha, E; Schilling, A C; Dos Anjos, L; Piotto, D; Dalmolin, A C; Mielke, M S
2016-01-01
Individual leaf area (LA) is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length - L and width - W) of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx) with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx) had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.
NASA Astrophysics Data System (ADS)
Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza
2018-03-01
In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
R, Jewkes; Y, Sikweyiya; K, Dunkle; R, Morrell
2015-07-07
Studies of rape of women seldom distinguish between men's participation in acts of single and multiple perpetrator rape. Multiple perpetrator rape (MPR) occurs globally with serious consequences for women. In South Africa it is a cultural practice with defined circumstances in which it commonly occurs. Prevention requires an understanding of whether it is a context specific intensification of single perpetrator rape, or a distinctly different practice of different men. This paper aims to address this question. We conducted a cross-sectional household study with a multi-stage, randomly selected sample of 1686 men aged 18-49 who completed a questionnaire administered using an Audio-enhanced Personal Digital Assistant. We attempted to fit an ordered logistic regression model for factors associated with rape perpetration. 27.6 % of men had raped and 8.8 % had perpetrated multiple perpetrator rape (MPR). Thus 31.9 % of men who had ever raped had done so with other perpetrators. An ordered regression model was fitted, showing that the same associated factors, albeit at higher prevalence, are associated with SPR and MPR. Multiple perpetrator rape appears as an intensified form of single perpetrator rape, rather than a different form of rape. Prevention approaches need to be mainstreamed among young men.
A Small and Slim Coaxial Probe for Single Rice Grain Moisture Sensing
You, Kok Yeow; Mun, Hou Kit; You, Li Ling; Salleh, Jamaliah; Abbas, Zulkifly
2013-01-01
A moisture detection of single rice grains using a slim and small open-ended coaxial probe is presented. The coaxial probe is suitable for the nondestructive measurement of moisture values in the rice grains ranging from from 9.5% to 26%. Empirical polynomial models are developed to predict the gravimetric moisture content of rice based on measured reflection coefficients using a vector network analyzer. The relationship between the reflection coefficient and relative permittivity were also created using a regression method and expressed in a polynomial model, whose model coefficients were obtained by fitting the data from Finite Element-based simulation. Besides, the designed single rice grain sample holder and experimental set-up were shown. The measurement of single rice grains in this study is more precise compared to the measurement in conventional bulk rice grains, as the random air gap present in the bulk rice grains is excluded. PMID:23493127
Deconvolution single shot multibox detector for supermarket commodity detection and classification
NASA Astrophysics Data System (ADS)
Li, Dejian; Li, Jian; Nie, Binling; Sun, Shouqian
2017-07-01
This paper proposes an image detection model to detect and classify supermarkets shelves' commodity. Based on the principle of the features directly affects the accuracy of the final classification, feature maps are performed to combine high level features with bottom level features. Then set some fixed anchors on those feature maps, finally the label and the position of commodity is generated by doing a box regression and classification. In this work, we proposed a model named Deconvolutiuon Single Shot MultiBox Detector, we evaluated the model using 300 images photographed from real supermarket shelves. Followed the same protocol in other recent methods, the results showed that our model outperformed other baseline methods.
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu
2016-01-01
We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054
Wildlife tradeoffs based on landscape models of habitat preference
Loehle, C.; Mitchell, M.S.; White, M.
2000-01-01
Wildlife tradeoffs based on landscape models of habitat preference were presented. Multiscale logistic regression models were used and based on these models a spatial optimization technique was utilized to generate optimal maps. The tradeoffs were analyzed by gradually increasing the weighting on a single species in the objective function over a series of simulations. Results indicated that efficiency of habitat management for species diversity could be maximized for small landscapes by incorporating spatial context.
Cost Estimation of Naval Ship Acquisition.
1983-12-01
one a 9-sub- system model , the other a single total cost model . The models were developed using the linear least squares regression tech- nique with...to Linear Statistical Models , McGraw-Hill, 1961. 11. Helmer, F. T., Bibliography on Pricing Methodology and Cost Estimating, Dept. of Economics and...SUPPI.EMSaTARY NOTES IS. KWRo" (Cowaft. en tever aide of ..aesep M’ Idab~t 6 Week ONNa.) Cost estimation; Acquisition; Parametric cost estimate; linear
Geographically weighted regression and multicollinearity: dispelling the myth
NASA Astrophysics Data System (ADS)
Fotheringham, A. Stewart; Oshan, Taylor M.
2016-10-01
Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.
Rapid prediction of single green coffee bean moisture and lipid content by hyperspectral imaging.
Caporaso, Nicola; Whitworth, Martin B; Grebby, Stephen; Fisk, Ian D
2018-06-01
Hyperspectral imaging (1000-2500 nm) was used for rapid prediction of moisture and total lipid content in intact green coffee beans on a single bean basis. Arabica and Robusta samples from several growing locations were scanned using a "push-broom" system. Hypercubes were segmented to select single beans, and average spectra were measured for each bean. Partial Least Squares regression was used to build quantitative prediction models on single beans (n = 320-350). The models exhibited good performance and acceptable prediction errors of ∼0.28% for moisture and ∼0.89% for lipids. This study represents the first time that HSI-based quantitative prediction models have been developed for coffee, and specifically green coffee beans. In addition, this is the first attempt to build such models using single intact coffee beans. The composition variability between beans was studied, and fat and moisture distribution were visualized within individual coffee beans. This rapid, non-destructive approach could have important applications for research laboratories, breeding programmes, and for rapid screening for industry.
Bayesian Travel Time Inversion adopting Gaussian Process Regression
NASA Astrophysics Data System (ADS)
Mauerberger, S.; Holschneider, M.
2017-12-01
A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.
Determinants of single family residential water use across scales in four western US cities.
Chang, Heejun; Bonnette, Matthew Ryan; Stoker, Philip; Crow-Miller, Britt; Wentz, Elizabeth
2017-10-15
A growing body of literature examines urban water sustainability with increasing evidence that locally-based physical and social spatial interactions contribute to water use. These studies however are based on single-city analysis and often fail to consider whether these interactions occur more generally. We examine a multi-city comparison using a common set of spatially-explicit water, socioeconomic, and biophysical data. We investigate the relative importance of variables for explaining the variations of single family residential (SFR) water uses at Census Block Group (CBG) and Census Tract (CT) scales in four representative western US cities - Austin, Phoenix, Portland, and Salt Lake City, - which cover a wide range of climate and development density. We used both ordinary least squares regression and spatial error regression models to identify the influence of spatial dependence on water use patterns. Our results show that older downtown areas show lower water use than newer suburban areas in all four cities. Tax assessed value and building age are the main determinants of SFR water use across the four cities regardless of the scale. Impervious surface area becomes an important variable for summer water use in all cities, and it is important in all seasons for arid environments such as Phoenix. CT level analysis shows better model predictability than CBG analysis. In all cities, seasons, and spatial scales, spatial error regression models better explain the variations of SFR water use. Such a spatially-varying relationship of urban water consumption provides additional evidence for the need to integrate urban land use planning and municipal water planning. Copyright © 2017 Elsevier B.V. All rights reserved.
Mohammed, Mohammed A; Manktelow, Bradley N; Hofer, Timothy P
2016-04-01
There is interest in deriving case-mix adjusted standardised mortality ratios so that comparisons between healthcare providers, such as hospitals, can be undertaken in the controversial belief that variability in standardised mortality ratios reflects quality of care. Typically standardised mortality ratios are derived using a fixed effects logistic regression model, without a hospital term in the model. This fails to account for the hierarchical structure of the data - patients nested within hospitals - and so a hierarchical logistic regression model is more appropriate. However, four methods have been advocated for deriving standardised mortality ratios from a hierarchical logistic regression model, but their agreement is not known and neither do we know which is to be preferred. We found significant differences between the four types of standardised mortality ratios because they reflect a range of underlying conceptual issues. The most subtle issue is the distinction between asking how an average patient fares in different hospitals versus how patients at a given hospital fare at an average hospital. Since the answers to these questions are not the same and since the choice between these two approaches is not obvious, the extent to which profiling hospitals on mortality can be undertaken safely and reliably, without resolving these methodological issues, remains questionable. © The Author(s) 2012.
Optimizing separate phase light hydrocarbon recovery from contaminated unconfined aquifers
NASA Astrophysics Data System (ADS)
Cooper, Grant S.; Peralta, Richard C.; Kaluarachchi, Jagath J.
A modeling approach is presented that optimizes separate phase recovery of light non-aqueous phase liquids (LNAPL) for a single dual-extraction well in a homogeneous, isotropic unconfined aquifer. A simulation/regression/optimization (S/R/O) model is developed to predict, analyze, and optimize the oil recovery process. The approach combines detailed simulation, nonlinear regression, and optimization. The S/R/O model utilizes nonlinear regression equations describing system response to time-varying water pumping and oil skimming. Regression equations are developed for residual oil volume and free oil volume. The S/R/O model determines optimized time-varying (stepwise) pumping rates which minimize residual oil volume and maximize free oil recovery while causing free oil volume to decrease a specified amount. This S/R/O modeling approach implicitly immobilizes the free product plume by reversing the water table gradient while achieving containment. Application to a simple representative problem illustrates the S/R/O model utility for problem analysis and remediation design. When compared with the best steady pumping strategies, the optimal stepwise pumping strategy improves free oil recovery by 11.5% and reduces the amount of residual oil left in the system due to pumping by 15%. The S/R/O model approach offers promise for enhancing the design of free phase LNAPL recovery systems and to help in making cost-effective operation and management decisions for hydrogeologists, engineers, and regulators.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
ERIC Educational Resources Information Center
Luna, Andrew L.
2007-01-01
This study used two multiple regression analyses to develop an explanatory model to determine which model might best explain faculty salaries. The central purpose of the study was to determine if using a single market ratio variable was a stronger predictor for faculty salaries than the use of dummy variables representing various disciplines.…
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2016-01-01
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Rasmussen, Patrick P.; Ziegler, Andrew C.
2003-01-01
The sanitary quality of water and its use as a public-water supply and for recreational activities, such as swimming, wading, boating, and fishing, can be evaluated on the basis of fecal coliform and Escherichia coli (E. coli) bacteria densities. This report describes the overall sanitary quality of surface water in selected Kansas streams, the relation between fecal coliform and E. coli, the relation between turbidity and bacteria densities, and how continuous bacteria estimates can be used to evaluate the water-quality conditions in selected Kansas streams. Samples for fecal coliform and E. coli were collected at 28 surface-water sites in Kansas. Of the 318 samples collected, 18 percent exceeded the current Kansas Department of Health and Environment (KDHE) secondary contact recreational, single-sample criterion for fecal coliform (2,000 colonies per 100 milliliters of water). Of the 219 samples collected during the recreation months (April 1 through October 31), 21 percent exceeded the current (2003) KDHE single-sample fecal coliform criterion for secondary contact rec-reation (2,000 colonies per 100 milliliters of water) and 36 percent exceeded the U.S. Environmental Protection Agency (USEPA) recommended single-sample primary contact recreational criterion for E. coli (576 colonies per 100 milliliters of water). Comparisons of fecal coliform and E. coli criteria indicated that more than one-half of the streams sampled could exceed USEPA recommended E. coli criteria more frequently than the current KDHE fecal coliform criteria. In addition, the ratios of E. coli to fecal coliform (EC/FC) were smallest for sites with slightly saline water (specific conductance greater than 1,000 microsiemens per centimeter at 25 degrees Celsius), indicating that E. coli may not be a good indicator of sanitary quality for those streams. Enterococci bacteria may provide a more accurate assessment of the potential for swimming-related illnesses in these streams. Ratios of EC/FC and linear regression models were developed for estimating E. coli densities on the basis of measured fecal coliform densities for six individual and six groups of surface-water sites. Regression models developed for the six individual surface-water sites and six groups of sites explain at least 89 percent of the variability in E. coli densities. The EC/FC ratios and regression models are site specific and make it possible to convert historic fecal coliform bacteria data to estimated E. coli densities for the selected sites. The EC/FC ratios can be used to estimate E. coli for any range of historical fecal coliform densities, and in some cases with less error than the regression models. The basin- and statewide regression models explained at least 93 percent of the variance and best represent the sites where a majority of the data used to develop the models were collected (Kansas and Little Arkansas Basins). Comparison of the current (2003) KDHE geometric-mean primary contact criterion for fecal coliform bacteria of 200 col/100 mL to the 2002 USEPA recommended geometric-mean criterion of 126 col/100 mL for E. coli results in an EC/FC ratio of 0.63. The geometric-mean EC/FC ratio for all sites except Rattlesnake Creek (site 21) is 0.77, indicating that considerably more than 63 percent of the fecal coliform is E. coli. This potentially could lead to more exceedances of the recommended E. coli criterion, where the water now meets the current (2003) 200-col/100 mL fecal coliform criterion. In this report, turbidity was found to be a reliable estimator of bacteria densities. Regression models are provided for estimating fecal coliform and E. coli bacteria densities using continuous turbidity measurements. Prediction intervals also are provided to show the uncertainty associated with using the regression models. Eighty percent of all measured sample densities and individual turbidity-based estimates from the regression models were in agreement as exceedi
Optimal weighted combinatorial forecasting model of QT dispersion of ECGs in Chinese adults.
Wen, Zhang; Miao, Ge; Xinlei, Liu; Minyi, Cen
2016-07-01
This study aims to provide a scientific basis for unifying the reference value standard of QT dispersion of ECGs in Chinese adults. Three predictive models including regression model, principal component model, and artificial neural network model are combined to establish the optimal weighted combination model. The optimal weighted combination model and single model are verified and compared. Optimal weighted combinatorial model can reduce predicting risk of single model and improve the predicting precision. The reference value of geographical distribution of Chinese adults' QT dispersion was precisely made by using kriging methods. When geographical factors of a particular area are obtained, the reference value of QT dispersion of Chinese adults in this area can be estimated by using optimal weighted combinatorial model and reference value of the QT dispersion of Chinese adults anywhere in China can be obtained by using geographical distribution figure as well.
Spectral Learning for Supervised Topic Models.
Ren, Yong; Wang, Yining; Zhu, Jun
2018-03-01
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.
Using Multilevel Modeling in Language Assessment Research: A Conceptual Introduction
ERIC Educational Resources Information Center
Barkaoui, Khaled
2013-01-01
This article critiques traditional single-level statistical approaches (e.g., multiple regression analysis) to examining relationships between language test scores and variables in the assessment setting. It highlights the conceptual, methodological, and statistical problems associated with these techniques in dealing with multilevel or nested…
Anastasopoulou, Panagiota; Tubic, Mirnes; Schmidt, Steffen; Neumann, Rainer; Woll, Alexander; Härtel, Sascha
2014-01-01
The measurement of activity energy expenditure (AEE) via accelerometry is the most commonly used objective method for assessing human daily physical activity and has gained increasing importance in the medical, sports and psychological science research in recent years. The purpose of this study was to determine which of the following procedures is more accurate to determine the energy cost during the most common everyday life activities; a single regression or an activity based approach. For this we used a device that utilizes single regression models (GT3X, ActiGraph Manufacturing Technology Inc., FL., USA) and a device using activity-dependent calculation models (move II, movisens GmbH, Karlsruhe, Germany). Nineteen adults (11 male, 8 female; 30.4±9.0 years) wore the activity monitors attached to the waist and a portable indirect calorimeter (IC) as reference measure for AEE while performing several typical daily activities. The accuracy of the two devices for estimating AEE was assessed as the mean differences between their output and the reference and evaluated using Bland-Altman analysis. The GT3X overestimated the AEE of walking (GT3X minus reference, 1.26 kcal/min), walking fast (1.72 kcal/min), walking up-/downhill (1.45 kcal/min) and walking upstairs (1.92 kcal/min) and underestimated the AEE of jogging (-1.30 kcal/min) and walking upstairs (-2.46 kcal/min). The errors for move II were smaller than those for GT3X for all activities. The move II overestimated AEE of walking (move II minus reference, 0.21 kcal/min), walking up-/downhill (0.06 kcal/min) and stair walking (upstairs: 0.13 kcal/min; downstairs: 0.29 kcal/min) and underestimated AEE of walking fast (-0.11 kcal/min) and jogging (-0.93 kcal/min). Our data suggest that the activity monitor using activity-dependent calculation models is more appropriate for predicting AEE in daily life than the activity monitor using a single regression model.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Julian, Samuel; Burnham, Carey-Ann D.; Sellenriek, Patricia; Shannon, William D.; Hamvas, Aaron; Tarr, Phillip I.; Warner, Barbara B.
2016-01-01
Objectives Infections cause significant morbidity and mortality in neonatal intensive care units (NICUs). The association between nursery design and nosocomial infections has not been delineated. We hypothesized that rates of colonization by methicillin-resistant Staphylococcus aureus (MRSA), late-onset sepsis, and mortality are reduced in single-patient rooms. Design Retrospective cohort study. Setting NICU in a tertiary referral center. Methods Our NICU is organized into single-patient and open-unit rooms. Clinical datasets including bed location and microbiology results were examined over a 29-month period. Differences in outcomes between bed configurations were determined by Chi-square and Cox regression. Patients All NICU patients. Results Among 1823 patients representing 55,166 patient-days, single-patient and open-unit models had similar incidences of MRSA colonization and MRSA colonization-free survival times. Average daily census was associated with MRSA colonization rates only in single-patient rooms (hazard ratio 1.31, p=0.039), while hand hygiene compliance on room entry and exit was associated with lower colonization rates independent of bed configuration (hazard ratios 0.834 and 0.719 per 1% higher compliance, respectively). Late-onset sepsis rates were similar in single-patient and open-unit models as were sepsis-free survival and the combined outcome of sepsis or death. After controlling for demographic, clinical and unit-based variables, multivariate Cox regression demonstrated that bed configuration had no effect on MRSA colonization, late-onset sepsis, or mortality. Conclusions MRSA colonization rate was impacted by hand hygiene compliance, regardless of room configuration, while average daily census only affected infants in single-patient rooms. Single-patient rooms did not reduce the rates of MRSA colonization, late-onset sepsis or death. PMID:26108888
Kennedy, Jeffrey R.; Paretti, Nicholas V.; Veilleux, Andrea G.
2014-01-01
Regression equations, which allow predictions of n-day flood-duration flows for selected annual exceedance probabilities at ungaged sites, were developed using generalized least-squares regression and flood-duration flow frequency estimates at 56 streamgaging stations within a single, relatively uniform physiographic region in the central part of Arizona, between the Colorado Plateau and Basin and Range Province, called the Transition Zone. Drainage area explained most of the variation in the n-day flood-duration annual exceedance probabilities, but mean annual precipitation and mean elevation were also significant variables in the regression models. Standard error of prediction for the regression equations varies from 28 to 53 percent and generally decreases with increasing n-day duration. Outside the Transition Zone there are insufficient streamgaging stations to develop regression equations, but flood-duration flow frequency estimates are presented at select streamgaging stations.
Potential pitfalls when denoising resting state fMRI data using nuisance regression.
Bright, Molly G; Tench, Christopher R; Murphy, Kevin
2017-07-01
In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Zheng, Qi; Peng, Limin
2016-01-01
Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212
Learning Supervised Topic Models for Classification and Regression from Crowds.
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C
2017-12-01
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.
Zhou, Qingping; Jiang, Haiyan; Wang, Jianzhou; Zhou, Jianling
2014-10-15
Exposure to high concentrations of fine particulate matter (PM₂.₅) can cause serious health problems because PM₂.₅ contains microscopic solid or liquid droplets that are sufficiently small to be ingested deep into human lungs. Thus, daily prediction of PM₂.₅ levels is notably important for regulatory plans that inform the public and restrict social activities in advance when harmful episodes are foreseen. A hybrid EEMD-GRNN (ensemble empirical mode decomposition-general regression neural network) model based on data preprocessing and analysis is firstly proposed in this paper for one-day-ahead prediction of PM₂.₅ concentrations. The EEMD part is utilized to decompose original PM₂.₅ data into several intrinsic mode functions (IMFs), while the GRNN part is used for the prediction of each IMF. The hybrid EEMD-GRNN model is trained using input variables obtained from principal component regression (PCR) model to remove redundancy. These input variables accurately and succinctly reflect the relationships between PM₂.₅ and both air quality and meteorological data. The model is trained with data from January 1 to November 1, 2013 and is validated with data from November 2 to November 21, 2013 in Xi'an Province, China. The experimental results show that the developed hybrid EEMD-GRNN model outperforms a single GRNN model without EEMD, a multiple linear regression (MLR) model, a PCR model, and a traditional autoregressive integrated moving average (ARIMA) model. The hybrid model with fast and accurate results can be used to develop rapid air quality warning systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng; Tzeng, Jung-Ying; Conneely, Karen N; Guan, Weihua; Kang, Jian; Li, Yun
2016-05-01
DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this study, we propose a penalized functional regression model to impute missing methylation data. By incorporating functional predictors, our model utilizes information from nonlocal probes to improve imputation quality. Here, we compared the performance of our functional model to linear regression and the best single probe surrogate in real data and via simulations. Specifically, we applied different imputation approaches to an acute myeloid leukemia dataset consisting of 194 samples and our method showed higher imputation accuracy, manifested, for example, by a 94% relative increase in information content and up to 86% more CpG sites passing post-imputation filtering. Our simulated association study further demonstrated that our method substantially improves the statistical power to identify trait-associated methylation loci. These findings indicate that the penalized functional regression model is a convenient and valuable imputation tool for methylation data, and it can boost statistical power in downstream epigenome-wide association study (EWAS). © 2016 WILEY PERIODICALS, INC.
Effects of Microstructural Parameters on Creep of Nickel-Base Superalloy Single Crystals
NASA Technical Reports Server (NTRS)
MacKay, Rebecca A.; Gabb, Timothy P.; Nathal, Michael V.
2013-01-01
Microstructure-sensitive creep models have been developed for Ni-base superalloy single crystals. Creep rupture testing was conducted on fourteen single crystal alloys at two applied stress levels at each of two temperatures, 982 and 1093 C. The variation in creep lives among the different alloys could be explained with regression models containing relatively few microstructural parameters. At 982 C, gamma-gamma prime lattice mismatch, gamma prime volume fraction, and initial gamma prime size were statistically significant in explaining the creep rupture lives. At 1093 C, only lattice mismatch and gamma prime volume fraction were significant. These models could explain from 84 to 94 percent of the variation in creep lives, depending on test condition. Longer creep lives were associated with alloys having more negative lattice mismatch, lower gamma prime volume fractions, and finer gamma prime sizes. The gamma-gamma prime lattice mismatch exhibited the strongest influence of all the microstructural parameters at both temperatures. Although a majority of the alloys in this study were stable with respect to topologically close packed (TCP) phases, it appeared that up to approximately 2 vol% TCP phase did not affect the 1093 C creep lives under applied stresses that produced lives of approximately 200 to 300 h. In contrast, TCP phase contents of approximately 2 vol% were detrimental at lower applied stresses where creep lives were longer. A regression model was also developed for the as-heat treated initial gamma prime size; this model showed that gamma prime solvus temperature, gamma-gamma prime lattice mismatch, and bulk Re content were all statistically significant.
Gender Role Conflict, Interest in Casual Sex, and Relationship Satisfaction Among Gay Men
Sanchez, Fráncisco J.; Bocklandt, Sven; Vilain, Eric
2010-01-01
This study compared single (n = 129) and partnered gay men (n = 114) to determine if they differed in their concerns over traditional masculine roles and interest in casual sex, and to measure the relationship between concerns over masculine roles and interest in casual sex. Additionally, a regression model to predict relationship satisfaction was tested. Participants were recruited at two Southern California Gay Pride festivals. Group comparisons showed single men were more restrictive in their affectionate behavior with other men (effect-size r = .14) and were more interested in casual sex than partnered men (effect-size r = .13); and partnered men were more concerned with being successful, powerful, and competitive than single men (effect-size r = .20). Different masculine roles were predictive of interest in casual sex among the two groups of men. Finally, a hierarchical regression analysis found that interest in casual sex and the length of one’s current relationship served as unique predictors of relationship satisfaction among the partnered gay men (Cohen’s f2 = .52). PMID:20721305
NASA Astrophysics Data System (ADS)
Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.
2017-12-01
Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Basu, Anirban; Manca, Andrea
2012-01-01
To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
ERIC Educational Resources Information Center
Kapes, Jerome T.; And Others
Three models of multiple regression analysis (MRA): single equation, commonality analysis, and path analysis, were applied to longitudinal data from the Pennsylvania Vocational Development Study. Variables influencing weekly income of vocational education students one year after high school graduation were examined: grade point averages (grades…
Stressed and Losing Sleep: Sleep Duration and Perceived Stress among Affluent Adolescent Females
ERIC Educational Resources Information Center
DeSilva Mousseau, Angela M.; Lund, Terese J.; Liang, Belle; Spencer, Renée; Walsh, Jill
2016-01-01
This study examined the relationship between stress and sleep duration for adolescent females from affluent backgrounds. Participants were 218 students attending two independent single-sex secondary schools. Ordinary Least Squares (OLS) regression models (cross-sectional and longitudinal) were run to examine the association between stress and…
Poverty and Material Hardship in Grandparent-Headed Households
ERIC Educational Resources Information Center
Baker, Lindsey A.; Mutchler, Jan E.
2010-01-01
Using the 2001 Survey of Income and Program Participation, the current study examines poverty and material hardship among children living in 3-generation (n = 486), skipped-generation (n = 238), single-parent (n = 2,076), and 2-parent (n = 6,061) households. Multinomial and logistic regression models indicated that children living in…
Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs.
Singh, Priyanka; Engel, Jasper; Jansen, Jeroen; de Haan, Jorn; Buydens, Lutgarde Maria Celina
2016-05-04
Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS. Eight genomic distance measures were used to quantify relationships between the tomato accessions from the SNPs. Subsequently, each of these distance measures was used to predict the BW using the DPLS prediction model. The DPLS model was found to be robust to the choice of distance measures; similar prediction performances were obtained for each distance measure. DPLS greatly outperformed the single-SNP regression approach, showing that BW is a comprehensive trait dependent on several loci. Next, the performance of the DPLS model was compared to that of GBLUP. Although GBLUP and DPLS are conceptually very different, the prediction quality (PQ) measured by DPLS models were similar to the prediction statistics obtained from GBLUP. A considerable advantage of DPLS is that the genotype-phenotype relationship can easily be visualized in a 2-D scatter plot. This so-called score-plot provides breeders an insight to select candidates for their future breeding program. DPLS is a highly appropriate method for GP. The model prediction performance was similar to the GBLUP and far better than the single-SNP approach. The proposed method can be used in combination with a wide range of genomic dissimilarity measures and genotype representations such as allele-count, haplotypes or allele-intensity values. Additionally, the data can be insightfully visualized by the DPLS model, allowing for selection of desirable candidates from the breeding experiments. In this study, we have assessed the DPLS performance on a single trait.
A novel model incorporating two variability sources for describing motor evoked potentials
Goetz, Stefan M.; Luber, Bruce; Lisanby, Sarah H.; Peterchev, Angel V.
2014-01-01
Objective Motor evoked potentials (MEPs) play a pivotal role in transcranial magnetic stimulation (TMS), e.g., for determining the motor threshold and probing cortical excitability. Sampled across the range of stimulation strengths, MEPs outline an input–output (IO) curve, which is often used to characterize the corticospinal tract. More detailed understanding of the signal generation and variability of MEPs would provide insight into the underlying physiology and aid correct statistical treatment of MEP data. Methods A novel regression model is tested using measured IO data of twelve subjects. The model splits MEP variability into two independent contributions, acting on both sides of a strong sigmoidal nonlinearity that represents neural recruitment. Traditional sigmoidal regression with a single variability source after the nonlinearity is used for comparison. Results The distribution of MEP amplitudes varied across different stimulation strengths, violating statistical assumptions in traditional regression models. In contrast to the conventional regression model, the dual variability source model better described the IO characteristics including phenomena such as changing distribution spread and skewness along the IO curve. Conclusions MEP variability is best described by two sources that most likely separate variability in the initial excitation process from effects occurring later on. The new model enables more accurate and sensitive estimation of the IO curve characteristics, enhancing its power as a detection tool, and may apply to other brain stimulation modalities. Furthermore, it extracts new information from the IO data concerning the neural variability—information that has previously been treated as noise. PMID:24794287
Sowande, O S; Oyewale, B F; Iyasere, O S
2010-06-01
The relationships between live weight and eight body measurements of West African Dwarf (WAD) goats were studied using 211 animals under farm condition. The animals were categorized based on age and sex. Data obtained on height at withers (HW), heart girth (HG), body length (BL), head length (HL), and length of hindquarter (LHQ) were fitted into simple linear, allometric, and multiple-regression models to predict live weight from the body measurements according to age group and sex. Results showed that live weight, HG, BL, LHQ, HL, and HW increased with the age of the animals. In multiple-regression model, HG and HL best fit the model for goat kids; HG, HW, and HL for goat aged 13-24 months; while HG, LHQ, HW, and HL best fit the model for goats aged 25-36 months. Coefficients of determination (R(2)) values for linear and allometric models for predicting the live weight of WAD goat increased with age in all the body measurements, with HG being the most satisfactory single measurement in predicting the live weight of WAD goat. Sex had significant influence on the model with R(2) values consistently higher in females except the models for LHQ and HW.
Mckay, Garrett; Huang, Wenxi; Romera-Castillo, Cristina; Crouch, Jenna E; Rosario-Ortiz, Fernando L; Jaffé, Rudolf
2017-05-16
The antioxidant capacity and formation of photochemically produced reactive intermediates (RI) was studied for water samples collected from the Florida Everglades with different spatial (marsh versus estuarine) and temporal (wet versus dry season) characteristics. Measured RI included triplet excited states of dissolved organic matter ( 3 DOM*), singlet oxygen ( 1 O 2 ), and the hydroxyl radical ( • OH). Single and multiple linear regression modeling were performed using a broad range of extrinsic (to predict RI formation rates, R RI ) and intrinsic (to predict RI quantum yields, Φ RI ) parameters. Multiple linear regression models consistently led to better predictions of R RI and Φ RI for our data set but poor prediction of Φ RI for a previously published data set,1 probably because the predictors are intercorrelated (Pearson's r > 0.5). Single linear regression models were built with data compiled from previously published studies (n ≈ 120) in which E2:E3, S, and Φ RI values were measured, which revealed a high degree of similarity between RI-optical property relationships across DOM samples of diverse sources. This study reveals that • OH formation is, in general, decoupled from 3 DOM* and 1 O 2 formation, providing supporting evidence that 3 DOM* is not a • OH precursor. Finally, Φ RI for 1 O 2 and 3 DOM* correlated negatively with antioxidant activity (a surrogate for electron donating capacity) for the collected samples, which is consistent with intramolecular oxidation of DOM moieties by 3 DOM*.
Developing a Model for Forecasting Road Traffic Accident (RTA) Fatalities in Yemen
NASA Astrophysics Data System (ADS)
Karim, Fareed M. A.; Abdo Saleh, Ali; Taijoobux, Aref; Ševrović, Marko
2017-12-01
The aim of this paper is to develop a model for forecasting RTA fatalities in Yemen. The yearly fatalities was modeled as the dependent variable, while the number of independent variables included the population, number of vehicles, GNP, GDP and Real GDP per capita. It was determined that all these variables are highly correlated with the correlation coefficient (r ≈ 0.9); in order to avoid multicollinearity in the model, a single variable with the highest r value was selected (real GDP per capita). A simple regression model was developed; the model was very good (R2=0.916); however, the residuals were serially correlated. The Prais-Winsten procedure was used to overcome this violation of the regression assumption. The data for a 20-year period from 1991-2010 were analyzed to build the model; the model was validated by using data for the years 2011-2013; the historical fit for the period 1991 - 2011 was very good. Also, the validation for 2011-2013 proved accurate.
Medalie, Laura
2014-01-01
Annual and daily concentrations and fluxes of total and dissolved phosphorus, total nitrogen, chloride, and total suspended solids were estimated for 18 monitored tributaries to Lake Champlain by using the Weighted Regressions on Time, Discharge, and Seasons regression model. Estimates were made for 21 or 23 years, depending on data availability, for the purpose of providing timely and accessible summary reports as stipulated in the 2010 update to the Lake Champlain “Opportunities for Action” management plan. Estimates of concentration and flux were provided for each tributary based on (1) observed daily discharges and (2) a flow-normalizing procedure, which removed the random fluctuations of climate-related variability. The flux bias statistic, an indicator of the ability of the Weighted Regressions on Time, Discharge, and Season regression models to provide accurate representations of flux, showed acceptable bias (less than ±10 percent) for 68 out of 72 models for total and dissolved phosphorus, total nitrogen, and chloride. Six out of 18 models for total suspended solids had moderate bias (between 10 and 30 percent), an expected result given the frequently nonlinear relation between total suspended solids and discharge. One model for total suspended solids with a very high bias was influenced by a single extreme value; however, removal of that value, although reducing the bias substantially, had little effect on annual fluxes.
Retrieval and Mapping of Heavy Metal Concentration in Soil Using Time Series Landsat 8 Imagery
NASA Astrophysics Data System (ADS)
Fang, Y.; Xu, L.; Peng, J.; Wang, H.; Wong, A.; Clausi, D. A.
2018-04-01
Heavy metal pollution is a critical global environmental problem which has always been a concern. Traditional approach to obtain heavy metal concentration relying on field sampling and lab testing is expensive and time consuming. Although many related studies use spectrometers data to build relational model between heavy metal concentration and spectra information, and then use the model to perform prediction using the hyperspectral imagery, this manner can hardly quickly and accurately map soil metal concentration of an area due to the discrepancies between spectrometers data and remote sensing imagery. Taking the advantage of easy accessibility of Landsat 8 data, this study utilizes Landsat 8 imagery to retrieve soil Cu concentration and mapping its distribution in the study area. To enlarge the spectral information for more accurate retrieval and mapping, 11 single date Landsat 8 imagery from 2013-2017 are selected to form a time series imagery. Three regression methods, partial least square regression (PLSR), artificial neural network (ANN) and support vector regression (SVR) are used to model construction. By comparing these models unbiasedly, the best model are selected to mapping Cu concentration distribution. The produced distribution map shows a good spatial autocorrelation and consistency with the mining area locations.
Nonlinear-regression groundwater flow modeling of a deep regional aquifer system
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-01-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System
NASA Astrophysics Data System (ADS)
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-12-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Satisfaction of active duty soldiers with family dental care.
Chisick, M C
1997-02-01
In the fall of 1992, a random, worldwide sample of 6,442 married and single parent soldiers completed a self-administered survey on satisfaction with 22 attributes of family dental care. Simple descriptive statistics for each attribute were derived, as was a composite overall satisfaction score using factor analysis. Composite scores were regressed on demographics, annual dental utilization, and access barriers to identify those factors having an impact on a soldier's overall satisfaction with family dental care. Separate regression models were constructed for single parents, childless couples, and couples with children. Results show below-average satisfaction with nearly all attributes of family dental care, with access attributes having the lowest average satisfaction scores. Factors influencing satisfaction with family dental care varied by family type with one exception: dependent dental utilization within the past year contributed positively to satisfaction across all family types.
Hybrid Rocket Performance Prediction with Coupling Method of CFD and Thermal Conduction Calculation
NASA Astrophysics Data System (ADS)
Funami, Yuki; Shimada, Toru
The final purpose of this study is to develop a design tool for hybrid rocket engines. This tool is a computer code which will be used in order to investigate rocket performance characteristics and unsteady phenomena lasting through the burning time, such as fuel regression or combustion oscillation. When phenomena inside a combustion chamber, namely boundary layer combustion, are described, it is difficult to use rigorous models for this target. It is because calculation cost may be too expensive. Therefore simple models are required for this calculation. In this study, quasi-one-dimensional compressible Euler equations for flowfields inside a chamber and the equation for thermal conduction inside a solid fuel are numerically solved. The energy balance equation at the solid fuel surface is solved to estimate fuel regression rate. Heat feedback model is Karabeyoglu's model dependent on total mass flux. Combustion model is global single step reaction model for 4 chemical species or chemical equilibrium model for 9 chemical species. As a first step, steady-state solutions are reported.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Silver, Matt; Montana, Giovanni
2012-01-01
Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small. PMID:22499682
Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J
2012-12-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.
Han, Hyung Joon; Choi, Sae Byeol; Park, Man Sik; Lee, Jin Suk; Kim, Wan Bae; Song, Tae Jin; Choi, Sang Yong
2011-07-01
Single port laparoscopic surgery has come to the forefront of minimally invasive surgery. For those familiar with conventional techniques, however, this type of operation demands a different type of eye/hand coordination and involves unfamiliar working instruments. Herein, the authors describe the learning curve and the clinical outcomes of single port laparoscopic cholecystectomy for 150 consecutive patients with benign gallbladder disease. All patients underwent single port laparoscopic cholecystectomy using a homemade glove port by one of five operators with different levels of experiences of laparoscopic surgery. The learning curve for each operator was fitted using the non-linear ordinary least squares method based on a non-linear regression model. Mean operating time was 77.6 ± 28.5 min. Fourteen patients (6.0%) were converted to conventional laparoscopic cholecystectomy. Complications occurred in 15 patients (10.0%), as follows: bile duct injury (n = 2), surgical site infection (n = 8), seroma (n = 2), and wound pain (n = 3). One operator achieved a learning curve plateau at 61.4 min per procedure after 8.5 cases and his time improved by 95.3 min as compared with initial operation time. Younger surgeons showed significant decreases in mean operation time and achieved stable mean operation times. In particular, younger surgeons showed significant decreases in operation times after 20 cases. Experienced laparoscopic surgeons can safely perform single port laparoscopic cholecystectomy using conventional or angled laparoscopic instruments. The present study shows that an operator can overcome the single port laparoscopic cholecystectomy learning curve in about eight cases.
MMI: Multimodel inference or models with management implications?
Fieberg, J.; Johnson, Douglas H.
2015-01-01
We consider a variety of regression modeling strategies for analyzing observational data associated with typical wildlife studies, including all subsets and stepwise regression, a single full model, and Akaike's Information Criterion (AIC)-based multimodel inference. Although there are advantages and disadvantages to each approach, we suggest that there is no unique best way to analyze data. Further, we argue that, although multimodel inference can be useful in natural resource management, the importance of considering causality and accurately estimating effect sizes is greater than simply considering a variety of models. Determining causation is far more valuable than simply indicating how the response variable and explanatory variables covaried within a data set, especially when the data set did not arise from a controlled experiment. Understanding the causal mechanism will provide much better predictions beyond the range of data observed. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
NASA Astrophysics Data System (ADS)
Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.
2012-06-01
In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).
Multi-model ensemble estimation of volume transport through the straits of the East/Japan Sea
NASA Astrophysics Data System (ADS)
Han, Sooyeon; Hirose, Naoki; Usui, Norihisa; Miyazawa, Yasumasa
2016-01-01
The volume transports measured at the Korea/Tsushima, Tsugaru, and Soya/La Perouse Straits remain quantitatively inconsistent. However, data assimilation models at least provide a self-consistent budget despite subtle differences among the models. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate more accurately transport at these straits by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.43 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L
2017-01-01
Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).
Jorgensen, Bradley S; Martin, John F; Pearce, Meryl; Willis, Eileen
2013-01-30
Research employing household water consumption data has sought to test models of water demand and conservation using variables from attitude theory. A significant, albeit unrecognised, challenge has been that attitude models describe individual-level motivations while consumption data is recorded at the household level thereby creating inconsistency between units of theory and measurement. This study employs structural equation modelling and moderated regression techniques to addresses the level of analysis problem, and tests hypotheses by isolating effects on water conservation in single-person households. Furthermore, the results question the explanatory utility of habit strength, perceived behavioural control, and intentions for understanding metered water conservation in single-person households. For example, evidence that intentions predict water conservation or that they interact with habit strength in single-person households was contrary to theoretical expectations. On the other hand, habit strength, self-reports of past water conservation, and perceived behavioural control were good predictors of intentions to conserve water. Copyright © 2012 Elsevier Ltd. All rights reserved.
When ab ≠ c - c': published errors in the reports of single-mediator models.
Petrocelli, John V; Clarkson, Joshua J; Whitmire, Melanie B; Moon, Paul E
2013-06-01
Accurate reports of mediation analyses are critical to the assessment of inferences related to causality, since these inferences are consequential for both the evaluation of previous research (e.g., meta-analyses) and the progression of future research. However, upon reexamination, approximately 15% of published articles in psychology contain at least one incorrect statistical conclusion (Bakker & Wicherts, Behavior research methods, 43, 666-678 2011), disparities that beget the question of inaccuracy in mediation reports. To quantify this question of inaccuracy, articles reporting standard use of single-mediator models in three high-impact journals in personality and social psychology during 2011 were examined. More than 24% of the 156 models coded failed an equivalence test (i.e., ab = c - c'), suggesting that one or more regression coefficients in mediation analyses are frequently misreported. The authors cite common sources of errors, provide recommendations for enhanced accuracy in reports of single-mediator models, and discuss implications for alternative methods.
Vegetation Monitoring with Gaussian Processes and Latent Force Models
NASA Astrophysics Data System (ADS)
Camps-Valls, Gustau; Svendsen, Daniel; Martino, Luca; Campos, Manuel; Luengo, David
2017-04-01
Monitoring vegetation by biophysical parameter retrieval from Earth observation data is a challenging problem, where machine learning is currently a key player. Neural networks, kernel methods, and Gaussian Process (GP) regression have excelled in parameter retrieval tasks at both local and global scales. GP regression is based on solid Bayesian statistics, yield efficient and accurate parameter estimates, and provides interesting advantages over competing machine learning approaches such as confidence intervals. However, GP models are hampered by lack of interpretability, that prevented the widespread adoption by a larger community. In this presentation we will summarize some of our latest developments to address this issue. We will review the main characteristics of GPs and their advantages in vegetation monitoring standard applications. Then, three advanced GP models will be introduced. First, we will derive sensitivity maps for the GP predictive function that allows us to obtain feature ranking from the model and to assess the influence of examples in the solution. Second, we will introduce a Joint GP (JGP) model that combines in situ measurements and simulated radiative transfer data in a single GP model. The JGP regression provides more sensible confidence intervals for the predictions, respects the physics of the underlying processes, and allows for transferability across time and space. Finally, a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical models of the system is presented. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Empirical evidence of the performance of these models will be presented through illustrative examples.
An overview of longitudinal data analysis methods for neurological research.
Locascio, Joseph J; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.
Eash, David A.
2015-01-01
An examination was conducted to understand why the 1987 single-variable RREs seem to provide better accuracy and less bias than either of the 2013 multi- or single-variable RREs. A comparison of 1-percent annual exceedance-probability regression lines for hydrologic regions 1-4 from the 1987 single-variable RREs and for flood regions 1-3 from the 2013 single-variable RREs indicates that the 1987 single-variable regional-regression lines generally have steeper slopes and lower discharges when compared to 2013 single-variable regional-regression lines for corresponding areas of Iowa. The combination of the definition of hydrologic regions, the lower discharges, and the steeper slopes of regression lines associated with the 1987 single-variable RREs seem to provide better accuracy and less bias when compared to the 2013 multi- or single-variable RREs; better accuracy and less bias was determined particularly for drainage areas less than 2 mi2, and also for some drainage areas between 2 and 20 mi2. The 2013 multi- and single-variable RREs are considered to provide better accuracy and less bias for larger drainage areas. Results of this study indicate that additional research is needed to address the curvilinear relation between drainage area and AEPDs for areas of Iowa.
Yuan, Zuoqiang; Wang, Shaopeng; Gazol, Antonio; Mellard, Jarad; Lin, Fei; Ye, Ji; Hao, Zhanqing; Wang, Xugao; Loreau, Michel
2016-12-01
Biodiversity can be measured by taxonomic, phylogenetic, and functional diversity. How ecosystem functioning depends on these measures of diversity can vary from site to site and depends on successional stage. Here, we measured taxonomic, phylogenetic, and functional diversity, and examined their relationship with biomass in two successional stages of the broad-leaved Korean pine forest in northeastern China. Functional diversity was calculated from six plant traits, and aboveground biomass (AGB) and coarse woody productivity (CWP) were estimated using data from three forest censuses (10 years) in two large fully mapped forest plots (25 and 5 ha). 11 of the 12 regressions between biomass variables (AGB and CWP) and indices of diversity showed significant positive relationships, especially those with phylogenetic diversity. The mean tree diversity-biomass regressions increased from 0.11 in secondary forest to 0.31 in old-growth forest, implying a stronger biodiversity effect in more mature forest. Multi-model selection results showed that models including species richness, phylogenetic diversity, and single functional traits explained more variation in forest biomass than other candidate models. The models with a single functional trait, i.e., leaf area in secondary forest and wood density in mature forest, provided better explanations for forest biomass than models that combined all six functional traits. This finding may reflect different strategies in growth and resource acquisition in secondary and old-growth forests.
D'Archivio, Angelo Antonio; Incani, Angela; Ruggieri, Fabrizio
2011-01-01
In this paper, we use a quantitative structure-retention relationship (QSRR) method to predict the retention times of polychlorinated biphenyls (PCBs) in comprehensive two-dimensional gas chromatography (GC×GC). We analyse the GC×GC retention data taken from the literature by comparing predictive capability of different regression methods. The various models are generated using 70 out of 209 PCB congeners in the calibration stage, while their predictive performance is evaluated on the remaining 139 compounds. The two-dimensional chromatogram is initially estimated by separately modelling retention times of PCBs in the first and in the second column ((1) t (R) and (2) t (R), respectively). In particular, multilinear regression (MLR) combined with genetic algorithm (GA) variable selection is performed to extract two small subsets of predictors for (1) t (R) and (2) t (R) from a large set of theoretical molecular descriptors provided by the popular software Dragon, which after removal of highly correlated or almost constant variables consists of 237 structure-related quantities. Based on GA-MLR analysis, a four-dimensional and a five-dimensional relationship modelling (1) t (R) and (2) t (R), respectively, are identified. Single-response partial least square (PLS-1) regression is alternatively applied to independently model (1) t (R) and (2) t (R) without the need for preliminary GA variable selection. Further, we explore the possibility of predicting the two-dimensional chromatogram of PCBs in a single calibration procedure by using a two-response PLS (PLS-2) model or a feed-forward artificial neural network (ANN) with two output neurons. In the first case, regression is carried out on the full set of 237 descriptors, while the variables previously selected by GA-MLR are initially considered as ANN inputs and subjected to a sensitivity analysis to remove the redundant ones. Results show PLS-1 regression exhibits a noticeably better descriptive and predictive performance than the other investigated approaches. The observed values of determination coefficients for (1) t (R) and (2) t (R) in calibration (0.9999 and 0.9993, respectively) and prediction (0.9987 and 0.9793, respectively) provided by PLS-1 demonstrate that GC×GC behaviour of PCBs is properly modelled. In particular, the predicted two-dimensional GC×GC chromatogram of 139 PCBs not involved in the calibration stage closely resembles the experimental one. Based on the above lines of evidence, the proposed approach ensures accurate simulation of the whole GC×GC chromatogram of PCBs using experimental determination of only 1/3 retention data of representative congeners.
Williams, Richard V.; Zak, Victor; Ravishankar, Chitra; Altmann, Karen; Anderson, Jeffrey; Atz, Andrew M.; Dunbar-Masterson, Carolyn; Ghanayem, Nancy; Lambert, Linda; Lurito, Karen; Medoff-Cooper, Barbara; Margossian, Renee; Pemberton, Victoria L.; Russell, Jennifer; Stylianou, Mario; Hsu, Daphne
2011-01-01
Objectives To describe growth patterns in infants with single ventricle physiology and determine factors influencing growth. Study design Data from 230 subjects enrolled in the Pediatric Heart Network Infant Single Ventricle Enalapril Trial were used to assess factors influencing change in weight-for-age z-score (Δz) from study enrollment (0.7 ± 0.4 months) to pre-superior cavopulmonary connection (SCPC) (5.1 ± 1.8 months, period 1), and pre-SCPC to final study visit (14.1 ± 0.9 months, period 2). Predictor variables included patient characteristics, feeding regimen, clinical center, and medical factors during neonatal (period 1) and SCPC hospitalizations (period 2). Univariate regression analysis was performed, followed by backward stepwise regression and bootstrapping reliability to inform a final multivariable model. Results Weights were available for 197/230 subjects for period 1 and 173/197 for period 2. For period 1, greater gestational age, younger age at study enrollment, tube feeding at neonatal discharge, and clinical center were associated with a greater negative Δz (poorer growth) in multivariable modeling (adjusted R2 = 0.39, p < 0.001). For period 2, younger age at SCPC and greater daily caloric intake were associated with greater positive Δz (better growth) (R2 = 0.10, p = 0.002). Conclusions Aggressive nutritional support and earlier SCPC are modifiable factors associated with a favorable change in weight-for-age z-score. PMID:21784436
Liang, Hao; Gao, Lian; Liang, Bingyu; Huang, Jiegang; Zang, Ning; Liao, Yanyan; Yu, Jun; Lai, Jingzhen; Qin, Fengxiang; Su, Jinming; Ye, Li; Chen, Hui
2016-01-01
Background Hepatitis is a serious public health problem with increasing cases and property damage in Heng County. It is necessary to develop a model to predict the hepatitis epidemic that could be useful for preventing this disease. Methods The autoregressive integrated moving average (ARIMA) model and the generalized regression neural network (GRNN) model were used to fit the incidence data from the Heng County CDC (Center for Disease Control and Prevention) from January 2005 to December 2012. Then, the ARIMA-GRNN hybrid model was developed. The incidence data from January 2013 to December 2013 were used to validate the models. Several parameters, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) and mean square error (MSE), were used to compare the performance among the three models. Results The morbidity of hepatitis from Jan 2005 to Dec 2012 has seasonal variation and slightly rising trend. The ARIMA(0,1,2)(1,1,1)12 model was the most appropriate one with the residual test showing a white noise sequence. The smoothing factor of the basic GRNN model and the combined model was 1.8 and 0.07, respectively. The four parameters of the hybrid model were lower than those of the two single models in the validation. The parameters values of the GRNN model were the lowest in the fitting of the three models. Conclusions The hybrid ARIMA-GRNN model showed better hepatitis incidence forecasting in Heng County than the single ARIMA model and the basic GRNN model. It is a potential decision-supportive tool for controlling hepatitis in Heng County. PMID:27258555
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Quantitative Analysis of Single and Mix Food Antiseptics Basing on SERS Spectra with PLSR Method
NASA Astrophysics Data System (ADS)
Hou, Mengjing; Huang, Yu; Ma, Lingwei; Zhang, Zhengjun
2016-06-01
Usage and dosage of food antiseptics are very concerned due to their decisive influence in food safety. Surface-enhanced Raman scattering (SERS) effect was employed in this research to realize trace potassium sorbate (PS) and sodium benzoate (SB) detection. HfO2 ultrathin film-coated Ag NR array was fabricated as SERS substrate. Protected by HfO2 film, the SERS substrate possesses good acid resistance, which enables it to be applicable in acidic environment where PS and SB work. Regression relationship between SERS spectra of 0.3~10 mg/L PS solution and their concentration was calibrated by partial least squares regression (PLSR) method, and the concentration prediction performance was quite satisfactory. Furthermore, mixture solution of PS and SB was also quantitatively analyzed by PLSR method. Spectrum data of characteristic peak sections corresponding to PS and SB was used to establish the regression models of these two solutes, respectively, and their concentrations were determined accurately despite their characteristic peak sections overlapping. It is possible that the unique modeling process of PLSR method prevented the overlapped Raman signal from reducing the model accuracy.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Genetic prediction of type 2 diabetes using deep neural network.
Kim, J; Kim, J; Kwak, M J; Bajaj, M
2018-04-01
Type 2 diabetes (T2DM) has strong heritability but genetic models to explain heritability have been challenging. We tested deep neural network (DNN) to predict T2DM using the nested case-control study of Nurses' Health Study (3326 females, 45.6% T2DM) and Health Professionals Follow-up Study (2502 males, 46.5% T2DM). We selected 96, 214, 399, and 678 single-nucleotide polymorphism (SNPs) through Fisher's exact test and L1-penalized logistic regression. We split each dataset randomly in 4:1 to train prediction models and test their performance. DNN and logistic regressions showed better area under the curve (AUC) of ROC curves than the clinical model when 399 or more SNPs included. DNN was superior than logistic regressions in AUC with 399 or more SNPs in male and 678 SNPs in female. Addition of clinical factors consistently increased AUC of DNN but failed to improve logistic regressions with 214 or more SNPs. In conclusion, we show that DNN can be a versatile tool to predict T2DM incorporating large numbers of SNPs and clinical information. Limitations include a relatively small number of the subjects mostly of European ethnicity. Further studies are warranted to confirm and improve performance of genetic prediction models using DNN in different ethnic groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Koch, Cosima; Posch, Andreas E; Goicoechea, Héctor C; Herwig, Christoph; Lendl, Bernhard
2014-01-07
This paper presents the quantification of Penicillin V and phenoxyacetic acid, a precursor, inline during Pencillium chrysogenum fermentations by FTIR spectroscopy and partial least squares (PLS) regression and multivariate curve resolution - alternating least squares (MCR-ALS). First, the applicability of an attenuated total reflection FTIR fiber optic probe was assessed offline by measuring standards of the analytes of interest and investigating matrix effects of the fermentation broth. Then measurements were performed inline during four fed-batch fermentations with online HPLC for the determination of Penicillin V and phenoxyacetic acid as reference analysis. PLS and MCR-ALS models were built using these data and validated by comparison of single analyte spectra with the selectivity ratio of the PLS models and the extracted spectral traces of the MCR-ALS models, respectively. The achieved root mean square errors of cross-validation for the PLS regressions were 0.22 g L(-1) for Penicillin V and 0.32 g L(-1) for phenoxyacetic acid and the root mean square errors of prediction for MCR-ALS were 0.23 g L(-1) for Penicillin V and 0.15 g L(-1) for phenoxyacetic acid. A general work-flow for building and assessing chemometric regression models for the quantification of multiple analytes in bioprocesses by FTIR spectroscopy is given. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.
2017-11-01
Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.
Bayesian Correction for Misclassification in Multilevel Count Data Models.
Nelson, Tyler; Song, Joon Jin; Chin, Yoo-Mi; Stamey, James D
2018-01-01
Covariate misclassification is well known to yield biased estimates in single level regression models. The impact on hierarchical count models has been less studied. A fully Bayesian approach to modeling both the misclassified covariate and the hierarchical response is proposed. Models with a single diagnostic test and with multiple diagnostic tests are considered. Simulation studies show the ability of the proposed model to appropriately account for the misclassification by reducing bias and improving performance of interval estimators. A real data example further demonstrated the consequences of ignoring the misclassification. Ignoring misclassification yielded a model that indicated there was a significant, positive impact on the number of children of females who observed spousal abuse between their parents. When the misclassification was accounted for, the relationship switched to negative, but not significant. Ignoring misclassification in standard linear and generalized linear models is well known to lead to biased results. We provide an approach to extend misclassification modeling to the important area of hierarchical generalized linear models.
Impact of job characteristics on psychological health of Chinese single working women.
Yeung, D Y; Tang, C S
2001-01-01
This study aims at investigating the impact of individual and contextual job characteristics of control, psychological and physical demand, and security on psychological distress of 193 Chinese single working women in Hong Kong. The mediating role of job satisfaction in the job characteristics-distress relation is also assessed. Multiple regression analysis results show that job satisfaction mediates the effects of job control and security in predicting psychological distress; whereas psychological job demand has an independent effect on mental distress after considering the effect of job satisfaction. This main effect model indicates that psychological distress is best predicted by small company size, high psychological job demand, and low job satisfaction. Results from a separate regression analysis fails to support the overall combined effect of job demand-control on psychological distress. However, a significant physical job demand-control interaction effect on mental distress is noted, which reduces slightly after controlling the effect of job satisfaction.
Data mining for materials design: A computational study of single molecule magnet
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dam, Hieu Chi; Faculty of Physics, Vietnam National University, 334 Nguyen Trai, Hanoi; Pham, Tien Lam
2014-01-28
We develop a method that combines data mining and first principles calculation to guide the designing of distorted cubane Mn{sup 4+} Mn {sub 3}{sup 3+} single molecule magnets. The essential idea of the method is a process consisting of sparse regressions and cross-validation for analyzing calculated data of the materials. The method allows us to demonstrate that the exchange coupling between Mn{sup 4+} and Mn{sup 3+} ions can be predicted from the electronegativities of constituent ligands and the structural features of the molecule by a linear regression model with high accuracy. The relations between the structural features and magnetic propertiesmore » of the materials are quantitatively and consistently evaluated and presented by a graph. We also discuss the properties of the materials and guide the material design basing on the obtained results.« less
The Necessity-Concerns-Framework: A Multidimensional Theory Benefits from Multidimensional Analysis
Phillips, L. Alison; Diefenbach, Michael; Kronish, Ian M.; Negron, Rennie M.; Horowitz, Carol R.
2014-01-01
Background Patients’ medication-related concerns and necessity-beliefs predict adherence. Evaluation of the potentially complex interplay of these two dimensions has been limited because of methods that reduce them to a single dimension (difference scores). Purpose We use polynomial regression to assess the multidimensional effect of stroke-event survivors’ medication-related concerns and necessity-beliefs on their adherence to stroke-prevention medication. Methods Survivors (n=600) rated their concerns, necessity-beliefs, and adherence to medication. Confirmatory and exploratory polynomial regression determined the best-fitting multidimensional model. Results As posited by the Necessity-Concerns Framework (NCF), the greatest and lowest adherence was reported by those with strong necessity-beliefs/weak concerns and strong concerns/weak necessity-beliefs, respectively. However, as could not be assessed using a difference-score model, patients with ambivalent beliefs were less adherent than those exhibiting indifference. Conclusions Polynomial regression allows for assessment of the multidimensional nature of the NCF. Clinicians/Researchers should be aware that concerns and necessity dimensions are not polar opposites. PMID:24500078
The necessity-concerns framework: a multidimensional theory benefits from multidimensional analysis.
Phillips, L Alison; Diefenbach, Michael A; Kronish, Ian M; Negron, Rennie M; Horowitz, Carol R
2014-08-01
Patients' medication-related concerns and necessity-beliefs predict adherence. Evaluation of the potentially complex interplay of these two dimensions has been limited because of methods that reduce them to a single dimension (difference scores). We use polynomial regression to assess the multidimensional effect of stroke-event survivors' medication-related concerns and necessity beliefs on their adherence to stroke-prevention medication. Survivors (n = 600) rated their concerns, necessity beliefs, and adherence to medication. Confirmatory and exploratory polynomial regression determined the best-fitting multidimensional model. As posited by the necessity-concerns framework (NCF), the greatest and lowest adherence was reported by those necessity weak concerns and strong concerns/weak Necessity-Beliefs, respectively. However, as could not be assessed using a difference-score model, patients with ambivalent beliefs were less adherent than those exhibiting indifference. Polynomial regression allows for assessment of the multidimensional nature of the NCF. Clinicians/Researchers should be aware that concerns and necessity dimensions are not polar opposites.
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.
Zhou, Pei-pei; Shan, Jin-feng; Jiang, Jian-lan
2015-12-01
To optimize the optimal microwave-assisted extraction method of curcuminoids from Curcuma longa. On the base of single factor experiment, the ethanol concentration, the ratio of liquid to solid and the microwave time were selected for further optimization. Support Vector Regression (SVR) and Central Composite Design-Response Surface Methodology (CCD) algorithm were utilized to design and establish models respectively, while Particle Swarm Optimization (PSO) was introduced to optimize the parameters of SVR models and to search optimal points of models. The evaluation indicator, the sum of curcumin, demethoxycurcumin and bisdemethoxycurcumin by HPLC, were used. The optimal parameters of microwave-assisted extraction were as follows: ethanol concentration of 69%, ratio of liquid to solid of 21 : 1, microwave time of 55 s. On those conditions, the sum of three curcuminoids was 28.97 mg/g (per gram of rhizomes powder). Both the CCD model and the SVR model were credible, for they have predicted the similar process condition and the deviation of yield were less than 1.2%.
Dunham, J.B.; Cade, B.S.; Terrell, J.W.
2002-01-01
We used regression quantiles to model potentially limiting relationships between the standing crop of cutthroat trout Oncorhynchus clarki and measures of stream channel morphology. Regression quantile models indicated that variation in fish density was inversely related to the width:depth ratio of streams but not to stream width or depth alone. The spatial and temporal stability of model predictions were examined across years and streams, respectively. Variation in fish density with width:depth ratio (10th-90th regression quantiles) modeled for streams sampled in 1993-1997 predicted the variation observed in 1998-1999, indicating similar habitat relationships across years. Both linear and nonlinear models described the limiting relationships well, the latter performing slightly better. Although estimated relationships were transferable in time, results were strongly dependent on the influence of spatial variation in fish density among streams. Density changes with width:depth ratio in a single stream were responsible for the significant (P < 0.10) negative slopes estimated for the higher quantiles (>80th). This suggests that stream-scale factors other than width:depth ratio play a more direct role in determining population density. Much of the variation in densities of cutthroat trout among streams was attributed to the occurrence of nonnative brook trout Salvelinus fontinalis (a possible competitor) or connectivity to migratory habitats. Regression quantiles can be useful for estimating the effects of limiting factors when ecological responses are highly variable, but our results indicate that spatiotemporal variability in the data should be explicitly considered. In this study, data from individual streams and stream-specific characteristics (e.g., the occurrence of nonnative species and habitat connectivity) strongly affected our interpretation of the relationship between width:depth ratio and fish density.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Confounder summary scores when comparing the effects of multiple drug exposures.
Cadarette, Suzanne M; Gagne, Joshua J; Solomon, Daniel H; Katz, Jeffrey N; Stürmer, Til
2010-01-01
Little information is available comparing methods to adjust for confounding when considering multiple drug exposures. We compared three analytic strategies to control for confounding based on measured variables: conventional multivariable, exposure propensity score (EPS), and disease risk score (DRS). Each method was applied to a dataset (2000-2006) recently used to examine the comparative effectiveness of four drugs. The relative effectiveness of risedronate, nasal calcitonin, and raloxifene in preventing non-vertebral fracture, were each compared to alendronate. EPSs were derived both by using multinomial logistic regression (single model EPS) and by three separate logistic regression models (separate model EPS). DRSs were derived and event rates compared using Cox proportional hazard models. DRSs derived among the entire cohort (full cohort DRS) was compared to DRSs derived only among the referent alendronate (unexposed cohort DRS). Less than 8% deviation from the base estimate (conventional multivariable) was observed applying single model EPS, separate model EPS or full cohort DRS. Applying the unexposed cohort DRS when background risk for fracture differed between comparison drug exposure cohorts resulted in -7 to + 13% deviation from our base estimate. With sufficient numbers of exposed and outcomes, either conventional multivariable, EPS or full cohort DRS may be used to adjust for confounding to compare the effects of multiple drug exposures. However, our data also suggest that unexposed cohort DRS may be problematic when background risks differ between referent and exposed groups. Further empirical and simulation studies will help to clarify the generalizability of our findings.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Effect Size Measure and Analysis of Single Subject Designs
ERIC Educational Resources Information Center
Society for Research on Educational Effectiveness, 2013
2013-01-01
One of the vexing problems in the analysis of SSD is in the assessment of the effect of intervention. Serial dependence notwithstanding, the linear model approach that has been advanced involves, in general, the fitting of regression lines (or curves) to the set of observations within each phase of the design and comparing the parameters of these…
A Meta-Analysis of Peer-Mediated Interventions for Young Children with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Zhang, Jie; Wheeler, John J.
2011-01-01
This meta-analysis investigated the efficacy of peer-mediated interventions for promoting social interactions among children from birth to eight years of age diagnosed with ASD. Forty-five single-subject design studies were analyzed and the effect sizes were calculated by the regression model developed by Allison and Gorman (1993). The overall…
NIR spectroscopic measurement of moisture content in Scots pine seeds.
Lestander, Torbjörn A; Geladi, Paul
2003-04-01
When tree seeds are used for seedling production it is important that they are of high quality in order to be viable. One of the factors influencing viability is moisture content and an ideal quality control system should be able to measure this factor quickly for each seed. Seed moisture content within the range 3-34% was determined by near-infrared (NIR) spectroscopy on Scots pine (Pinus sylvestris L.) single seeds and on bulk seed samples consisting of 40-50 seeds. The models for predicting water content from the spectra were made by partial least squares (PLS) and ordinary least squares (OLS) regression. Different conditions were simulated involving both using less wavelengths and going from samples to single seeds. Reflectance and transmission measurements were used. Different spectral pretreatment methods were tested on the spectra. Including bias, the lowest prediction errors for PLS models based on reflectance within 780-2280 nm from bulk samples and single seeds were 0.8% and 1.9%, respectively. Reduction of the single seed reflectance spectrum to 850-1048 nm gave higher biases and prediction errors in the test set. In transmission (850-1048 nm) the prediction error was 2.7% for single seeds. OLS models based on simulated 4-sensor single seed system consisting of optical filters with Gaussian transmission indicated more than 3.4% error in prediction. A practical F-test based on test sets to differentiate models is introduced.
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A
2015-01-15
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.
Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.
2014-01-01
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152
Effect of marital status on treatment and survival of extremity soft tissue sarcoma
Alamanda, V. K.; Song, Y.; Holt, G. E.
2014-01-01
Background Spousal support has been hypothesized as providing important psychosocial support for patients and as such has been noted to provide a survival advantage in a number of chronic diseases and cancers. However, the specific effect of marital status on survival in soft tissue sarcomas (STSs) of the extremity has not been explored in detail. Patients and methods A total of 7384 patients were evaluated for this study using a Surveillance, Epidemiology, and End Results (SEER) registry query for patients over 20 years old with extremity STS diagnosed between 2004 and 2009. Survival outcomes were analyzed using Gray's test after patients were stratified by marital status. The Fine and Gray model, a multivariable regression model, was used to assess whether marital status was an independent predictor of sarcoma specific death. Statistical significance was maintained at P < 0.05. Results Analysis of the SEER database showed that single patients were more likely to die of their STS and at a faster rate than married patients. No differences were noted in tumor size and tumor site on presentation between married and single patients. However, single patients presented with higher grade tumors more frequently (P = 0.013), received less radiotherapy (P < 0.001), and had less surgery carried out (P < 0.001), compared with their married peers. Regression analysis showed that after accounting for tumor size, grade, site, histology, use of radiotherapy, age, gender, region where the patients were from, and income, being single continued to serve as an independent predictor of sarcoma-specific death; P < 0.0001. Conclusion Overall survival is worse for single patients, when compared with married patients, with STS. Single patients do not undergo surgical resection or receive radiation therapy as frequently as their married counterparts. Social support systems and barriers to care should be evaluated at time of diagnosis and addressed in single patients to potentially improve survival outcomes. PMID:24504446
Changren Weng; Thomas L. Kubisiak; C. Dana Nelson; James P. Geaghan; Michael Stine
1999-01-01
Single marker regression and single marker maximum likelihood estimation were tied to detect quantitative trait loci (QTLs) controlling the early height growth of longleaf pine and slash pine using a ((longleaf pine x slash pine) x slash pine) BC, population consisting of 83 progeny. Maximum likelihood estimation was found to be more power than regression and could...
An Overview of Longitudinal Data Analysis Methods for Neurological Research
Locascio, Joseph J.; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models. PMID:22203825
Jackson, Dan; White, Ian R; Riley, Richard D
2013-01-01
Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches
NASA Astrophysics Data System (ADS)
Mohammed, E.; Wang, S.; Yu, J.
2017-05-01
Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.
A New Metric for Land-Atmosphere Coupling Strength: Applications on Observations and Modeling
NASA Astrophysics Data System (ADS)
Tang, Q.; Xie, S.; Zhang, Y.; Phillips, T. J.; Santanello, J. A., Jr.; Cook, D. R.; Riihimaki, L.; Gaustad, K.
2017-12-01
A new metric is proposed to quantify the land-atmosphere (LA) coupling strength and is elaborated by correlating the surface evaporative fraction and impacting land and atmosphere variables (e.g., soil moisture, vegetation, and radiation). Based upon multiple linear regression, this approach simultaneously considers multiple factors and thus represents complex LA coupling mechanisms better than existing single variable metrics. The standardized regression coefficients quantify the relative contributions from individual drivers in a consistent manner, avoiding the potential inconsistency in relative influence of conventional metrics. Moreover, the unique expendable feature of the new method allows us to verify and explore potentially important coupling mechanisms. Our observation-based application of the new metric shows moderate coupling with large spatial variations at the U.S. Southern Great Plains. The relative importance of soil moisture vs. vegetation varies by location. We also show that LA coupling strength is generally underestimated by single variable methods due to their incompleteness. We also apply this new metric to evaluate the representation of LA coupling in the Accelerated Climate Modeling for Energy (ACME) V1 Contiguous United States (CONUS) regionally refined model (RRM). This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-734201
Liu, Chia-Chuan; Shih, Chih-Shiun; Pennarun, Nicolas; Cheng, Chih-Tao
2016-01-01
The feasibility and radicalism of lymph node dissection for lung cancer surgery by a single-port technique has frequently been challenged. We performed a retrospective cohort study to investigate this issue. Two chest surgeons initiated multiple-port thoracoscopic surgery in a 180-bed cancer centre in 2005 and shifted to a single-port technique gradually after 2010. Data, including demographic and clinical information, from 389 patients receiving multiport thoracoscopic lobectomy or segmentectomy and 149 consecutive patients undergoing either single-port lobectomy or segmentectomy for primary non-small-cell lung cancer were retrieved and entered for statistical analysis by multivariable linear regression models and Box-Cox transformed multivariable analysis. The mean number of total dissected lymph nodes in the lobectomy group was 28.5 ± 11.7 for the single-port group versus 25.2 ± 11.3 for the multiport group; the mean number of total dissected lymph nodes in the segmentectomy group was 19.5 ± 10.8 for the single-port group versus 17.9 ± 10.3 for the multiport group. In linear multivariable and after Box-Cox transformed multivariable analyses, the single-port approach was still associated with a higher total number of dissected lymph nodes. The total number of dissected lymph nodes for primary lung cancer surgery by single-port video-assisted thoracoscopic surgery (VATS) was higher than by multiport VATS in univariable, multivariable linear regression and Box-Cox transformed multivariable analyses. This study confirmed that highly effective lymph node dissection could be achieved through single-port VATS in our setting. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
ADCYAP1R1 and asthma in Puerto Rican children.
Chen, Wei; Boutaoui, Nadia; Brehm, John M; Han, Yueh-Ying; Schmitz, Cassandra; Cressley, Alex; Acosta-Pérez, Edna; Alvarez, María; Colón-Semidey, Angel; Baccarelli, Andrea A; Weeks, Daniel E; Kolls, Jay K; Canino, Glorisa; Celedón, Juan C
2013-03-15
Epigenetic and/or genetic variation in the gene encoding the receptor for adenylate-cyclase activating polypeptide 1 (ADCYAP1R1) has been linked to post-traumatic stress disorder in adults and anxiety in children. Psychosocial stress has been linked to asthma morbidity in Puerto Rican children. To examine whether epigenetic or genetic variation in ADCYAP1R1 is associated with childhood asthma in Puerto Ricans. We conducted a case-control study of 516 children ages 6-14 years living in San Juan, Puerto Rico. We assessed methylation at a CpG site in the promoter of ADCYAP1R1 (cg11218385) using a pyrosequencing assay in DNA from white blood cells. We tested whether cg11218385 methylation (range, 0.4-6.1%) is associated with asthma using logistic regression. We also examined whether exposure to violence (assessed by the Exposure to Violence [ETV] Scale in children 9 yr and older) is associated with cg11218385 methylation (using linear regression) or asthma (using logistic regression). Logistic regression was used to test for association between a single nucleotide polymorphism in ADCYAP1R1 (rs2267735) and asthma under an additive model. All multivariate models were adjusted for age, sex, household income, and principal components. EACH 1% increment in cg11218385 methylation was associated with increased odds of asthma (adjusted odds ratio, 1.3; 95% confidence interval, 1.0-1.6; P = 0.03). Among children 9 years and older, exposure to violence was associated with cg11218385 methylation. The C allele of single nucleotide polymorphism rs2267735 was significantly associated with increased odds of asthma (adjusted odds ratio, 1.3; 95% confidence interval, 1.02-1.67; P = 0.03). Epigenetic and genetic variants in ADCYAP1R1 are associated with asthma in Puerto Rican children.
GWAS with longitudinal phenotypes: performance of approximate procedures
Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel
2015-01-01
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081
NASA Astrophysics Data System (ADS)
Visser, H.; Molenaar, J.
1995-05-01
The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.
Xing, Jian; Burkom, Howard; Tokars, Jerome
2011-12-01
Automated surveillance systems require statistical methods to recognize increases in visit counts that might indicate an outbreak. In prior work we presented methods to enhance the sensitivity of C2, a commonly used time series method. In this study, we compared the enhanced C2 method with five regression models. We used emergency department chief complaint data from US CDC BioSense surveillance system, aggregated by city (total of 206 hospitals, 16 cities) during 5/2008-4/2009. Data for six syndromes (asthma, gastrointestinal, nausea and vomiting, rash, respiratory, and influenza-like illness) was used and was stratified by mean count (1-19, 20-49, ≥50 per day) into 14 syndrome-count categories. We compared the sensitivity for detecting single-day artificially-added increases in syndrome counts. Four modifications of the C2 time series method, and five regression models (two linear and three Poisson), were tested. A constant alert rate of 1% was used for all methods. Among the regression models tested, we found that a Poisson model controlling for the logarithm of total visits (i.e., visits both meeting and not meeting a syndrome definition), day of week, and 14-day time period was best. Among 14 syndrome-count categories, time series and regression methods produced approximately the same sensitivity (<5% difference) in 6; in six categories, the regression method had higher sensitivity (range 6-14% improvement), and in two categories the time series method had higher sensitivity. When automated data are aggregated to the city level, a Poisson regression model that controls for total visits produces the best overall sensitivity for detecting artificially added visit counts. This improvement was achieved without increasing the alert rate, which was held constant at 1% for all methods. These findings will improve our ability to detect outbreaks in automated surveillance system data. Published by Elsevier Inc.
A Functional Varying-Coefficient Single-Index Model for Functional Response Data
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2016-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. PMID:29200540
A Functional Varying-Coefficient Single-Index Model for Functional Response Data.
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2017-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) study.
Srinivas, N R
2016-08-01
Linear regression models utilizing a single time point (Cmax) has been reported for pravastatin and simvastatin. A new model was developed for the prediction of AUC of statins that utilized the slopes of the above 2 models, with pharmacokinetic (Cmax) and a pharmacodynamic (IC50 value) components for the statins. The prediction of AUCs for various statins (pravastatin, atorvastatin, simvastatin and rosuvastatin) was carried out using the newly developed dual pharmacokinetic and pharmacodynamic model. Generally, the AUC predictions were contained within 0.5 to 2-fold difference of the observed AUC suggesting utility of the new models. The root mean square error predictions were<45% for the 2 models. On the basis of the present work, it is feasible to utilize both pharmacokinetic (Cmax) and pharmacodynamic (IC50) data for effectively predicting the AUC for statins. Such a new concept as described in the work may have utility in both drug discovery and development stages. © Georg Thieme Verlag KG Stuttgart · New York.
Single Motherhood, Alcohol Dependence, and Smoking During Pregnancy: A Propensity Score Analysis.
Waldron, Mary; Bucholz, Kathleen K; Lian, Min; Lessov-Schlaggar, Christina N; Miller, Ruth Huang; Lynskey, Michael T; Knopik, Valerie S; Madden, Pamela A F; Heath, Andrew C
2017-09-01
Few studies linking single motherhood and maternal smoking during pregnancy consider correlated risk from problem substance use beyond history of smoking and concurrent use of alcohol. In the present study, we used propensity score methods to examine whether the risk of smoking during pregnancy associated with single motherhood is the result of potential confounders, including alcohol dependence. Data were drawn from mothers participating in a birth cohort study of their female like-sex twin offspring (n = 257 African ancestry; n = 1,711 European or other ancestry). We conducted standard logistic regression models predicting smoking during pregnancy from single motherhood at twins' birth, followed by propensity score analyses comparing single-mother and two-parent families stratified by predicted probability of single motherhood. In standard models, single motherhood predicted increased risk of smoking during pregnancy in European ancestry but not African ancestry families. In propensity score analyses, rates of smoking during pregnancy were elevated in single-mother relative to two-parent European ancestry families across much of the spectrum a priori risk of single motherhood. Among African ancestry families, within-strata comparisons of smoking during pregnancy by single-mother status were nonsignificant. These findings highlight single motherhood as a unique risk factor for smoking during pregnancy in European ancestry mothers, over and above alcohol dependence. Additional research is needed to identify risks, beyond single motherhood, associated with smoking during pregnancy in African ancestry mothers.
de Souza e Silva, Rebeca; Andreoni, Solange
2012-07-01
The scope of this study was to evaluate the association between having had an induced abortion and marital status (being single or legally married) in women residing in the city of São Paulo. This analysis is derived from a broader population survey on abortion conducted in 2008. In this study we focus on the subset of 389 single and legally married women between 15 and 49 years of age. Logistic regression models were used to evaluate the association between induced abortion and being single or married, monitoring age, education, income, number of live births, contraceptive use and acceptance of the practice of abortion. Being single was the only characteristic associated with having had an induced abortion, in other words, when faced with a pregnancy single women were four times more likely to have an abortion than married women (OR=3.9; p=0.009).
Single-parent households and children's educational achievement: A state-level analysis.
Amato, Paul R; Patterson, Sarah; Beattie, Brett
2015-09-01
Although many studies have examined associations between family structure and children's educational achievement at the individual level, few studies have considered how the increase in single-parent households may have affected children's educational achievement at the population level. We examined changes in the percentage of children living with single parents between 1990 and 2011 and state mathematics and reading scores on the National Assessment of Educational Progress. Regression models with state and year fixed effects revealed that changes in the percentage of children living with single parents were not associated with test scores. Increases in maternal education, however, were associated with improvements in children's test scores during this period. These results do not support the notion that increases in single parenthood have had serious consequences for U.S. children's school achievement. Copyright © 2015 Elsevier Inc. All rights reserved.
Housing Satisfaction of Older (55+) Single-Person Householders in U.S. Rural Communities.
Ahn, Mira; Lee, Sung-Jin
2016-08-01
This study aims to understand the housing satisfaction of older (55+) single-person householders in U.S. rural communities using the available variables from a secondary data set, the 2011 American Housing Survey (AHS). In this study, housing satisfaction was considered to be an indicator of quality of life. Based on previous studies, we developed a model to test a hypothesized relationship between older (55+) single-person householders' (N = 1,017) housing satisfaction and their personal, physical, financial, and environmental characteristics. Multiple regression results showed that the model was supported, indicating that significant variables in housing satisfaction include age, gender, health status, age of house, structure type, and unit location. Among the significant variables, health status was revealed to be the strongest factor in housing satisfaction. Housing satisfaction was discussed as potential indicators of quality of life. © The Author(s) 2015.
Potential for Bias When Estimating Critical Windows for Air Pollution in Children's Health.
Wilson, Ander; Chiu, Yueh-Hsiu Mathilda; Hsu, Hsiao-Hsien Leon; Wright, Robert O; Wright, Rosalind J; Coull, Brent A
2017-12-01
Evidence supports an association between maternal exposure to air pollution during pregnancy and children's health outcomes. Recent interest has focused on identifying critical windows of vulnerability. An analysis based on a distributed lag model (DLM) can yield estimates of a critical window that are different from those from an analysis that regresses the outcome on each of the 3 trimester-average exposures (TAEs). Using a simulation study, we assessed bias in estimates of critical windows obtained using 3 regression approaches: 1) 3 separate models to estimate the association with each of the 3 TAEs; 2) a single model to jointly estimate the association between the outcome and all 3 TAEs; and 3) a DLM. We used weekly fine-particulate-matter exposure data for 238 births in a birth cohort in and around Boston, Massachusetts, and a simulated outcome and time-varying exposure effect. Estimates using separate models for each TAE were biased and identified incorrect windows. This bias arose from seasonal trends in particulate matter that induced correlation between TAEs. Including all TAEs in a single model reduced bias. DLM produced unbiased estimates and added flexibility to identify windows. Analysis of body mass index z score and fat mass in the same cohort highlighted inconsistent estimates from the 3 methods. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Clary, Christelle; Lewis, Daniel J; Flint, Ellen; Smith, Neil R; Kestens, Yan; Cummins, Steven
2016-12-01
Studies that explore associations between the local food environment and diet routinely use global regression models, which assume that relationships are invariant across space, yet such stationarity assumptions have been little tested. We used global and geographically weighted regression models to explore associations between the residential food environment and fruit and vegetable intake. Analyses were performed in 4 boroughs of London, United Kingdom, using data collected between April 2012 and July 2012 from 969 adults in the Olympic Regeneration in East London Study. Exposures were assessed both as absolute densities of healthy and unhealthy outlets, taken separately, and as a relative measure (proportion of total outlets classified as healthy). Overall, local models performed better than global models (lower Akaike information criterion). Locally estimated coefficients varied across space, regardless of the type of exposure measure, although changes of sign were observed only when absolute measures were used. Despite findings from global models showing significant associations between the relative measure and fruit and vegetable intake (β = 0.022; P < 0.01) only, geographically weighted regression models using absolute measures outperformed models using relative measures. This study suggests that greater attention should be given to nonstationary relationships between the food environment and diet. It further challenges the idea that a single measure of exposure, whether relative or absolute, can reflect the many ways the food environment may shape health behaviors. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Glavatskikh, Marta; Madzhidov, Timur; Solov'ev, Vitaly; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2016-12-01
In this work, we report QSPR modeling of the free energy ΔG of 1 : 1 hydrogen bond complexes of different H-bond acceptors and donors. The modeling was performed on a large and structurally diverse set of 3373 complexes featuring a single hydrogen bond, for which ΔG was measured at 298 K in CCl 4 . The models were prepared using Support Vector Machine and Multiple Linear Regression, with ISIDA fragment descriptors. The marked atoms strategy was applied at fragmentation stage, in order to capture the location of H-bond donor and acceptor centers. Different strategies of model validation have been suggested, including the targeted omission of individual H-bond acceptors and donors from the training set, in order to check whether the predictive ability of the model is not limited to the interpolation of H-bond strength between two already encountered partners. Successfully cross-validating individual models were combined into a consensus model, and challenged to predict external test sets of 629 and 12 complexes, in which donor and acceptor formed single and cooperative H-bonds, respectively. In all cases, SVM models outperform MLR. The SVM consensus model performs well both in 3-fold cross-validation (RMSE=1.50 kJ/mol), and on the external test sets containing complexes with single (RMSE=3.20 kJ/mol) and cooperative H-bonds (RMSE=1.63 kJ/mol). © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B
2018-04-01
Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
Separation mechanism of nortriptyline and amytriptyline in RPLC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gritti, Fabrice; Guiochon, Georges A
2005-08-01
The single and the competitive equilibrium isotherms of nortriptyline and amytriptyline were acquired by frontal analysis (FA) on the C{sub 18}-bonded discovery column, using a 28/72 (v/v) mixture of acetonitrile and water buffered with phosphate (20 mM, pH 2.70). The adsorption energy distributions (AED) of each compound were calculated from the raw adsorption data. Both the fitting of the adsorption data using multi-linear regression analysis and the AEDs are consistent with a trimodal isotherm model. The single-component isotherm data fit well to the tri-Langmuir isotherm model. The extension to a competitive two-component tri-Langmuir isotherm model based on the best parametersmore » of the single-component isotherms does not account well for the breakthrough curves nor for the overloaded band profiles measured for mixtures of nortriptyline and amytriptyline. However, it was possible to derive adjusted parameters of a competitive tri-Langmuir model based on the fitting of the adsorption data obtained for these mixtures. A very good agreement was then found between the calculated and the experimental overloaded band profiles of all the mixtures injected.« less
Jiang, Rong; French, John E.; Stober, Vandy P.; Kang-Sickel, Juei-Chuan C.; Zou, Fei
2012-01-01
Background: Individual genetic variation that results in differences in systemic response to xenobiotic exposure is not accounted for as a predictor of outcome in current exposure assessment models. Objective: We developed a strategy to investigate individual differences in single-nucleotide polymorphisms (SNPs) as genetic markers associated with naphthyl–keratin adduct (NKA) levels measured in the skin of workers exposed to naphthalene. Methods: The SNP-association analysis was conducted in PLINK using candidate-gene analysis and genome-wide analysis. We identified significant SNP–NKA associations and investigated the potential impact of these SNPs along with personal and workplace factors on NKA levels using a multiple linear regression model and the Pratt index. Results: In candidate-gene analysis, a SNP (rs4852279) located near the CYP26B1 gene contributed to the 2-naphthyl–keratin adduct (2NKA) level. In the multiple linear regression model, the SNP rs4852279, dermal exposure, exposure time, task replacing foam, age, and ethnicity all were significant predictors of 2NKA level. In genome-wide analysis, no single SNP reached genome-wide significance for NKA levels (all p ≥ 1.05 × 10–5). Pathway and network analyses of SNPs associated with NKA levels were predicted to be involved in the regulation of cellular processes and homeostasis. Conclusions: These results provide evidence that a quantitative biomarker can be used as an intermediate phenotype when investigating the association between genetic markers and exposure–dose relationship in a small, well-characterized exposed worker population. PMID:22391508
Multi-model ensemble combinations of the water budget in the East/Japan Sea
NASA Astrophysics Data System (ADS)
HAN, S.; Hirose, N.; Usui, N.; Miyazawa, Y.
2016-02-01
The water balance of East/Japan Sea is determined mainly by inflow and outflow through the Korea/Tsushima, Tsugaru and Soya/La Perouse Straits. However, the volume transports measured at three straits remain quantitatively unbalanced. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate physically consistent circulation in East/Japan Sea by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.42 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
Genotype-phenotype association study via new multi-task learning model
Huo, Zhouyuan; Shen, Dinggang
2018-01-01
Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
PharmML in Action: an Interoperable Language for Modeling and Simulation.
Bizzotto, R; Comets, E; Smith, G; Yvon, F; Kristensen, N R; Swat, M J
2017-10-01
PharmML is an XML-based exchange format created with a focus on nonlinear mixed-effect (NLME) models used in pharmacometrics, but providing a very general framework that also allows describing mathematical and statistical models such as single-subject or nonlinear and multivariate regression models. This tutorial provides an overview of the structure of this language, brief suggestions on how to work with it, and use cases demonstrating its power and flexibility. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Decoding of finger trajectory from ECoG using deep learning.
Xie, Ziqian; Schwartz, Odelia; Prasad, Abhishek
2018-06-01
Conventional decoding pipeline for brain-machine interfaces (BMIs) consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes it difficult to make the whole system adaptive. The goal was to create an adaptive online system with a single objective function and a single learning algorithm so that the whole system can be trained in parallel to increase the decoding performance. Here, we used deep neural networks consisting of convolutional neural networks (CNN) and a special kind of recurrent neural network (RNN) called long short term memory (LSTM) to address these needs. We used electrocorticography (ECoG) data collected by Kubanek et al. The task consisted of individual finger flexions upon a visual cue. Our model combined a hierarchical feature extractor CNN and a RNN that was able to process sequential data and recognize temporal dynamics in the neural data. CNN was used as the feature extractor and LSTM was used as the regression algorithm to capture the temporal dynamics of the signal. We predicted the finger trajectory using ECoG signals and compared results for the least angle regression (LARS), CNN-LSTM, random forest, LSTM model (LSTM_HC, for using hard-coded features) and a decoding pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. This study demonstrated a decoding network for BMI that involved a convolutional and recurrent neural network model. It integrated the feature extraction pipeline into the convolution and pooling layer and used LSTM layer to capture the state transitions. The discussed network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning.
Decoding of finger trajectory from ECoG using deep learning
NASA Astrophysics Data System (ADS)
Xie, Ziqian; Schwartz, Odelia; Prasad, Abhishek
2018-06-01
Objective. Conventional decoding pipeline for brain-machine interfaces (BMIs) consists of chained different stages of feature extraction, time-frequency analysis and statistical learning models. Each of these stages uses a different algorithm trained in a sequential manner, which makes it difficult to make the whole system adaptive. The goal was to create an adaptive online system with a single objective function and a single learning algorithm so that the whole system can be trained in parallel to increase the decoding performance. Here, we used deep neural networks consisting of convolutional neural networks (CNN) and a special kind of recurrent neural network (RNN) called long short term memory (LSTM) to address these needs. Approach. We used electrocorticography (ECoG) data collected by Kubanek et al. The task consisted of individual finger flexions upon a visual cue. Our model combined a hierarchical feature extractor CNN and a RNN that was able to process sequential data and recognize temporal dynamics in the neural data. CNN was used as the feature extractor and LSTM was used as the regression algorithm to capture the temporal dynamics of the signal. Main results. We predicted the finger trajectory using ECoG signals and compared results for the least angle regression (LARS), CNN-LSTM, random forest, LSTM model (LSTM_HC, for using hard-coded features) and a decoding pipeline consisting of band-pass filtering, energy extraction, feature selection and linear regression. The results showed that the deep learning models performed better than the commonly used linear model. The deep learning models not only gave smoother and more realistic trajectories but also learned the transition between movement and rest state. Significance. This study demonstrated a decoding network for BMI that involved a convolutional and recurrent neural network model. It integrated the feature extraction pipeline into the convolution and pooling layer and used LSTM layer to capture the state transitions. The discussed network eliminated the need to separately train the model at each step in the decoding pipeline. The whole system can be jointly optimized using stochastic gradient descent and is capable of online learning.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
Vakilian, Katayon; Mousavi, Seyed Abbas; Keramat, Afsaneh
2014-01-13
In many countries, negative social attitude towards sensitive issues such as sexual behavior has resulted in false and invalid data concerning this issue.This is an analytical cross-sectional study, in which a total number of 1500 single students from universities of Shahroud City were sampled using a multi stage technique. The students were assured that their information disclosed for the researcher will be treated as private and confidential. The results were analyzed using crosswise model, Crosswise Regression, T-test and Chi-square tests. It seems that the prevalence of sexual behavior among Iranian youth is 41% (CI = 36-53). Findings showed that estimation sexual relationship in Iranian single youth is high. Thus, devising training models according to the Islamic-Iranian culture is necessary in order to prevent risky sexual behavior.
Kumar, S.; Spaulding, S.A.; Stohlgren, T.J.; Hermann, K.A.; Schmidt, T.S.; Bahls, L.L.
2009-01-01
The diatom Didymosphenia geminata is a single-celled alga found in lakes, streams, and rivers. Nuisance blooms of D geminata affect the diversity, abundance, and productivity of other aquatic organisms. Because D geminata can be transported by humans on waders and other gear, accurate spatial prediction of habitat suitability is urgently needed for early detection and rapid response, as well as for evaluation of monitoring and control programs. We compared four modeling methods to predict D geminata's habitat distribution; two methods use presence-absence data (logistic regression and classification and regression tree [CART]), and two involve presence data (maximum entropy model [Maxent] and genetic algorithm for rule-set production [GARP]). Using these methods, we evaluated spatially explicit, bioclimatic and environmental variables as predictors of diatom distribution. The Maxent model provided the most accurate predictions, followed by logistic regression, CART, and GARP. The most suitable habitats were predicted to occur in the western US, in relatively cool sites, and at high elevations with a high base-flow index. The results provide insights into the factors that affect the distribution of D geminata and a spatial basis for the prediction of nuisance blooms. ?? The Ecological Society of America.
Flexible Meta-Regression to Assess the Shape of the Benzene–Leukemia Exposure–Response Curve
Vlaanderen, Jelle; Portengen, Lützen; Rothman, Nathaniel; Lan, Qing; Kromhout, Hans; Vermeulen, Roel
2010-01-01
Background Previous evaluations of the shape of the benzene–leukemia exposure–response curve (ERC) were based on a single set or on small sets of human occupational studies. Integrating evidence from all available studies that are of sufficient quality combined with flexible meta-regression models is likely to provide better insight into the functional relation between benzene exposure and risk of leukemia. Objectives We used natural splines in a flexible meta-regression method to assess the shape of the benzene–leukemia ERC. Methods We fitted meta-regression models to 30 aggregated risk estimates extracted from nine human observational studies and performed sensitivity analyses to assess the impact of a priori assessed study characteristics on the predicted ERC. Results The natural spline showed a supralinear shape at cumulative exposures less than 100 ppm-years, although this model fitted the data only marginally better than a linear model (p = 0.06). Stratification based on study design and jackknifing indicated that the cohort studies had a considerable impact on the shape of the ERC at high exposure levels (> 100 ppm-years) but that predicted risks for the low exposure range (< 50 ppm-years) were robust. Conclusions Although limited by the small number of studies and the large heterogeneity between studies, the inclusion of all studies of sufficient quality combined with a flexible meta-regression method provides the most comprehensive evaluation of the benzene–leukemia ERC to date. The natural spline based on all data indicates a significantly increased risk of leukemia [relative risk (RR) = 1.14; 95% confidence interval (CI), 1.04–1.26] at an exposure level as low as 10 ppm-years. PMID:20064779
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bian, Xihui; Li, Shujuan; Lin, Ligang; Tan, Xiaoyao; Fan, Qingjie; Li, Ming
2016-06-21
Accurate prediction of the model is fundamental to the successful analysis of complex samples. To utilize abundant information embedded over frequency and time domains, a novel regression model is presented for quantitative analysis of hydrocarbon contents in the fuel oil samples. The proposed method named as high and low frequency unfolded PLSR (HLUPLSR), which integrates empirical mode decomposition (EMD) and unfolded strategy with partial least squares regression (PLSR). In the proposed method, the original signals are firstly decomposed into a finite number of intrinsic mode functions (IMFs) and a residue by EMD. Secondly, the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix. Finally, the two matrices are unfolded to an extended matrix in variable dimension, and then the PLSR model is built between the extended matrix and the target values. Coupled with Ultraviolet (UV) spectroscopy, HLUPLSR has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples. Comparing with single PLSR and other signal processing techniques, the proposed method shows superiority in prediction ability and better model interpretation. Therefore, HLUPLSR method provides a promising tool for quantitative analysis of complex samples. Copyright © 2016 Elsevier B.V. All rights reserved.
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.
The conditional resampling model STARS: weaknesses of the modeling concept and development
NASA Astrophysics Data System (ADS)
Menz, Christoph
2016-04-01
The Statistical Analogue Resampling Scheme (STARS) is based on a modeling concept of Werner and Gerstengarbe (1997). The model uses a conditional resampling technique to create a simulation time series from daily observations. Unlike other time series generators (such as stochastic weather generators) STARS only needs a linear regression specification of a single variable as the target condition for the resampling. Since its first implementation the algorithm was further extended in order to allow for a spatially distributed trend signal, to preserve the seasonal cycle and the autocorrelation of the observation time series (Orlovsky, 2007; Orlovsky et al., 2008). This evolved version was successfully used in several climate impact studies. However a detaild evaluation of the simulations revealed two fundamental weaknesses of the utilized resampling technique. 1. The restriction of the resampling condition on a single individual variable can lead to a misinterpretation of the change signal of other variables when the model is applied to a mulvariate time series. (F. Wechsung and M. Wechsung, 2014). As one example, the short-term correlations between precipitation and temperature (cooling of the near-surface air layer after a rainfall event) can be misinterpreted as a climatic change signal in the simulation series. 2. The model restricts the linear regression specification to the annual mean time series, refusing the specification of seasonal varying trends. To overcome these fundamental weaknesses a redevelopment of the whole algorithm was done. The poster discusses the main weaknesses of the earlier model implementation and the methods applied to overcome these in the new version. Based on the new model idealized simulations were conducted to illustrate the enhancement.
Temperature-viscosity models reassessed.
Peleg, Micha
2017-05-04
The temperature effect on viscosity of liquid and semi-liquid foods has been traditionally described by the Arrhenius equation, a few other mathematical models, and more recently by the WLF and VTF (or VFT) equations. The essence of the Arrhenius equation is that the viscosity is proportional to the absolute temperature's reciprocal and governed by a single parameter, namely, the energy of activation. However, if the absolute temperature in K in the Arrhenius equation is replaced by T + b where both T and the adjustable b are in °C, the result is a two-parameter model, which has superior fit to experimental viscosity-temperature data. This modified version of the Arrhenius equation is also mathematically equal to the WLF and VTF equations, which are known to be equal to each other. Thus, despite their dissimilar appearances all three equations are essentially the same model, and when used to fit experimental temperature-viscosity data render exactly the same very high regression coefficient. It is shown that three new hybrid two-parameter mathematical models, whose formulation bears little resemblance to any of the conventional models, can also have excellent fit with r 2 ∼ 1. This is demonstrated by comparing the various models' regression coefficients to published viscosity-temperature relationships of 40% sucrose solution, soybean oil, and 70°Bx pear juice concentrate at different temperature ranges. Also compared are reconstructed temperature-viscosity curves using parameters calculated directly from 2 or 3 data points and fitted curves obtained by nonlinear regression using a larger number of experimental viscosity measurements.
NASA Astrophysics Data System (ADS)
Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.
2005-05-01
Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08
Casey, Martin F.; Neidell, Matthew
2013-01-01
Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
Factors associated with single-vehicle and multi-vehicle road traffic collision injuries in Ireland.
Donnelly-Swift, Erica; Kelly, Alan
2016-12-01
Generalised linear regression models were used to identify factors associated with fatal/serious road traffic collision injuries for single- and multi-vehicle collisions. Single-vehicle collisions and multi-vehicle collisions occurring during the hours of darkness or on a wet road surface had reduced likelihood of a fatal/serious injury. Single-vehicle 'driver with passengers' collisions occurring at junctions or on a hill/gradient were less likely to result in a fatal/serious injury. Multi-vehicle rear-end/angle collisions had reduced likelihood of a fatal/serious injury. Single-vehicle 'driver only' collisions and multi-vehicle collisions occurring on a public/bank holiday or on a hill/gradient were more likely to result in a fatal/serious injury. Single-vehicle collisions involving male drivers had increased likelihood of a fatal/serious injury and single-vehicle 'driver with passengers' collisions involving drivers under the age of 25 years also had increased likelihood of a fatal/serious injury. Findings can enlighten decision-makers to circumstances leading to fatal/serious injuries.
Improved prediction of biochemical recurrence after radical prostatectomy by genetic polymorphisms.
Morote, Juan; Del Amo, Jokin; Borque, Angel; Ars, Elisabet; Hernández, Carlos; Herranz, Felipe; Arruza, Antonio; Llarena, Roberto; Planas, Jacques; Viso, María J; Palou, Joan; Raventós, Carles X; Tejedor, Diego; Artieda, Marta; Simón, Laureano; Martínez, Antonio; Rioja, Luis A
2010-08-01
Single nucleotide polymorphisms are inherited genetic variations that can predispose or protect individuals against clinical events. We hypothesized that single nucleotide polymorphism profiling may improve the prediction of biochemical recurrence after radical prostatectomy. We performed a retrospective, multi-institutional study of 703 patients treated with radical prostatectomy for clinically localized prostate cancer who had at least 5 years of followup after surgery. All patients were genotyped for 83 prostate cancer related single nucleotide polymorphisms using a low density oligonucleotide microarray. Baseline clinicopathological variables and single nucleotide polymorphisms were analyzed to predict biochemical recurrence within 5 years using stepwise logistic regression. Discrimination was measured by ROC curve AUC, specificity, sensitivity, predictive values, net reclassification improvement and integrated discrimination index. The overall biochemical recurrence rate was 35%. The model with the best fit combined 8 covariates, including the 5 clinicopathological variables prostate specific antigen, Gleason score, pathological stage, lymph node involvement and margin status, and 3 single nucleotide polymorphisms at the KLK2, SULT1A1 and TLR4 genes. Model predictive power was defined by 80% positive predictive value, 74% negative predictive value and an AUC of 0.78. The model based on clinicopathological variables plus single nucleotide polymorphisms showed significant improvement over the model without single nucleotide polymorphisms, as indicated by 23.3% net reclassification improvement (p = 0.003), integrated discrimination index (p <0.001) and likelihood ratio test (p <0.001). Internal validation proved model robustness (bootstrap corrected AUC 0.78, range 0.74 to 0.82). The calibration plot showed close agreement between biochemical recurrence observed and predicted probabilities. Predicting biochemical recurrence after radical prostatectomy based on clinicopathological data can be significantly improved by including patient genetic information. Copyright (c) 2010 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Snow, David P.
2016-01-01
This study investigates infants’ transition from nonverbal to verbal communication using evidence from regression patterns. As an example of regressions, prelinguistic infants learning American Sign Language (ASL) use pointing gestures to communicate. At the onset of single signs, however, these gestures disappear. Petitto (1987) attributed the regression to the children’s discovery that pointing has two functions, namely, deixis and linguistic pronouns. The 1:2 relation (1 form, 2 functions) violates the simple 1:1 pattern that infants are believed to expect. This kind of conflict, Petitto argued, explains the regression. Based on the additional observation that the regression coincided with the boundary between prelinguistic and linguistic communication, Petitto concluded that the prelinguistic and linguistic periods are autonomous. The purpose of the present study was to evaluate the 1:1 model and to determine whether it explains a previously reported regression of intonation in English. Background research showed that gestures and intonation have different forms but the same pragmatic meanings, a 2:1 form-function pattern that plausibly precipitates the regression. The hypothesis of the study was that gestures and intonation are closely related. Moreover, because gestures and intonation change in the opposite direction, the negative correlation between them indicates a robust inverse relationship. To test this prediction, speech samples of 29 infants (8 to 16 months) were analyzed acoustically and compared to parent-report data on several verbal and gestural scales. In support of the hypothesis, gestures alone were inversely correlated with intonation. In addition, the regression model explains nonlinearities stemming from different form-function configurations. However, the results failed to support the claim that regressions linked to early words or signs reflect autonomy. The discussion ends with a focus on the special role of intonation in children’s transition from “prelinguistic” communication to language. PMID:28729753
The validation of a human force model to predict dynamic forces resulting from multi-joint motions
NASA Technical Reports Server (NTRS)
Pandya, Abhilash K.; Maida, James C.; Aldridge, Ann M.; Hasson, Scott M.; Woolford, Barbara J.
1992-01-01
The development and validation is examined of a dynamic strength model for humans. This model is based on empirical data. The shoulder, elbow, and wrist joints were characterized in terms of maximum isolated torque, or position and velocity, in all rotational planes. This data was reduced by a least squares regression technique into a table of single variable second degree polynomial equations determining torque as a function of position and velocity. The isolated joint torque equations were then used to compute forces resulting from a composite motion, in this case, a ratchet wrench push and pull operation. A comparison of the predicted results of the model with the actual measured values for the composite motion indicates that forces derived from a composite motion of joints (ratcheting) can be predicted from isolated joint measures. Calculated T values comparing model versus measured values for 14 subjects were well within the statistically acceptable limits and regression analysis revealed coefficient of variation between actual and measured to be within 0.72 and 0.80.
Weissman-Miller, Deborah
2013-11-02
Point estimation is particularly important in predicting weight loss in individuals or small groups. In this analysis, a new health response function is based on a model of human response over time to estimate long-term health outcomes from a change point in short-term linear regression. This important estimation capability is addressed for small groups and single-subject designs in pilot studies for clinical trials, medical and therapeutic clinical practice. These estimations are based on a change point given by parameters derived from short-term participant data in ordinary least squares (OLS) regression. The development of the change point in initial OLS data and the point estimations are given in a new semiparametric ratio estimator (SPRE) model. The new response function is taken as a ratio of two-parameter Weibull distributions times a prior outcome value that steps estimated outcomes forward in time, where the shape and scale parameters are estimated at the change point. The Weibull distributions used in this ratio are derived from a Kelvin model in mechanics taken here to represent human beings. A distinct feature of the SPRE model in this article is that initial treatment response for a small group or a single subject is reflected in long-term response to treatment. This model is applied to weight loss in obesity in a secondary analysis of data from a classic weight loss study, which has been selected due to the dramatic increase in obesity in the United States over the past 20 years. A very small relative error of estimated to test data is shown for obesity treatment with the weight loss medication phentermine or placebo for the test dataset. An application of SPRE in clinical medicine or occupational therapy is to estimate long-term weight loss for a single subject or a small group near the beginning of treatment.
Hallén, Jonas; Jensen, Jesper K; Fagerland, Morten W; Jaffe, Allan S; Atar, Dan
2010-12-01
To investigate the ability of cardiac troponin I (cTnI) to predict functional recovery and left ventricular remodelling following primary percutaneous coronary intervention (pPCI) in ST-elevation myocardial infarction (STEMI). Post hoc study extending from randomised controlled trial. 132 patients with STEMI receiving pPCI. Left ventricular ejection fraction (LVEF), end-diastolic and end-systolic volume index (EDVI and ESVI) and changes in these parameters from day 5 to 4 months after the index event. Cardiac magnetic resonance examination performed at 5 days and 4 months for evaluation of LVEF, EDVI and ESVI. cTnI was sampled at 24 and 48 h. In linear regression models adjusted for early (5 days) assessment of LVEF, ESVI and EDVI, single-point cTnI at either 24 or 48 h were independent and strong predictors of changes in LVEF (p<0.01), EDVI (p<0.01) and ESVI (p<0.01) during the follow-up period. In a logistic regression analysis for prediction of an LVEF below 40% at 4 months, single-point cTnI significantly improved the prognostic strength of the model (area under the curve = 0.94, p<0.01) in comparison with the combination of clinical variables and LVEF at 5 days. Single-point sampling of cTnI after pPCI for STEMI provides important prognostic information on the time-dependent evolution of left ventricular function and volumes.
NASA Astrophysics Data System (ADS)
Medina, Hanoi; Tian, Di; Srivastava, Puneet; Pelosi, Anna; Chirico, Giovanni B.
2018-07-01
Reference evapotranspiration (ET0) plays a fundamental role in agronomic, forestry, and water resources management. Estimating and forecasting ET0 have long been recognized as a major challenge for researchers and practitioners in these communities. This work explored the potential of multiple leading numerical weather predictions (NWPs) for estimating and forecasting summer ET0 at 101 U.S. Regional Climate Reference Network stations over nine climate regions across the contiguous United States (CONUS). Three leading global NWP model forecasts from THORPEX Interactive Grand Global Ensemble (TIGGE) dataset were used in this study, including the single model ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (EC), the National Centers for Environmental Prediction Global Forecast System (NCEP), and the United Kingdom Meteorological Office forecasts (MO), as well as multi-model ensemble forecasts from the combinations of these NWP models. A regression calibration was employed to bias correct the ET0 forecasts. Impact of individual forecast variables on ET0 forecasts were also evaluated. The results showed that the EC forecasts provided the least error and highest skill and reliability, followed by the MO and NCEP forecasts. The multi-model ensembles constructed from the combination of EC and MO forecasts provided slightly better performance than the single model EC forecasts. The regression process greatly improved ET0 forecast performances, particularly for the regions involving stations near the coast, or with a complex orography. The performance of EC forecasts was only slightly influenced by the size of the ensemble members, particularly at short lead times. Even with less ensemble members, EC still performed better than the other two NWPs. Errors in the radiation forecasts, followed by those in the wind, had the most detrimental effects on the ET0 forecast performances.
Ciesielski, K T; Lesnik, P G; Benzel, E C; Hart, B L; Sanders, J A
1999-06-01
Neurotoxic intrathecal chemotherapy for childhood acute lymphoblastic leukemia (ALL) affects developing structures and functions of memory and learning subsystems selectively. Results show significant reductions in magnetic resonance imaging morphometry of mamillary bodies, components of the corticolimbic-diencephalic subsystem subserving functionally later developing, single-trial memory, nonsignificant changes in bilateral heads of the caudate nuclei, components of the corticostriatal subsystem subserving functionally earlier developing, multitrial learning, significant reductions in prefrontal cortical volume, visual and verbal single-trial memory deficits, and visuospatial, but not verbal, multitrial learning deficits. Multiple regression models provide evidence for partial dissociation and connectivity between the subsystems, and suggest that greater involvement of caudate may compensate for inefficient corticolimbic-diencephalic components.
Time trend of polycyclic aromatic hydrocarbon emission factors from motor vehicles
NASA Astrophysics Data System (ADS)
Tao, Shu; Shen, Huizhong; Wang, Rong; Sun, Kang
2010-05-01
Motor vehicle is an important emission source of polycyclic aromatic hydrocarbons (PAHs) and this is particularly true in urban areas. Motor vehicle emission factors (EFs) for individual PAH compound reported in the literature varied for 4 to 5 orders of magnitude, leading to high uncertainty in emission estimation. In this study, the major factors affecting EFs were investigated and characterized by regression models. Based on the model developed, a motor vehicle PAH emission inventory at country level was developed. It was found that country and model year are the most important factors affecting EFs for PAHs. The influence of the two factors can be quantified by a single parameter of per capita gross domestic production (purchasing power parity), which was used as the independent variables of the regression models. The models developed using randomly selected 80% of measurements and tested with the remained data accounted for 28 to 48% of the variations in EFs for PAHs measured in 16 countries over 50 years. The regression coefficients of the EF prediction models were molecular weight dependent. Motor vehicle emission of PAHs from individual countries in the world in 1985, 1995, 2005, 2015, and 2025 were calculated and the global emission of total PAHs were 470, 390, and 430 Gg in 1985, 1995, and 2005 and will be 290 and 130 Gg in 2015 and 2025, respectively. The emission is currently passing its peak and will decrease due to significant decrease in China and other developing countries.
Reynolds, Penny S; Tamariz, Francisco J; Barbee, Robert Wayne
2010-04-01
Exploratory pilot studies are crucial to best practice in research but are frequently conducted without a systematic method for maximizing the amount and quality of information obtained. We describe the use of response surface regression models and simultaneous optimization methods to develop a rat model of hemorrhagic shock in the context of chronic hypertension, a clinically relevant comorbidity. Response surface regression model was applied to determine optimal levels of two inputs--dietary NaCl concentration (0.49%, 4%, and 8%) and time on the diet (4, 6, 8 weeks)--to achieve clinically realistic and stable target measures of systolic blood pressure while simultaneously maximizing critical oxygen delivery (a measure of vulnerability to hemorrhagic shock) and body mass M. Simultaneous optimization of the three response variables was performed though a dimensionality reduction strategy involving calculation of a single aggregate measure, the "desirability" function. Optimal conditions for inducing systolic blood pressure of 208 mmHg, critical oxygen delivery of 4.03 mL/min, and M of 290 g were determined to be 4% [NaCl] for 5 weeks. Rats on the 8% diet did not survive past 7 weeks. Response surface regression model and simultaneous optimization method techniques are commonly used in process engineering but have found little application to date in animal pilot studies. These methods will ensure both the scientific and ethical integrity of experimental trials involving animals and provide powerful tools for the development of novel models of clinically interacting comorbidities with shock.
Estimating the global incidence of traumatic spinal cord injury.
Fitzharris, M; Cripps, R A; Lee, B B
2014-02-01
Population modelling--forecasting. To estimate the global incidence of traumatic spinal cord injury (TSCI). An initiative of the International Spinal Cord Society (ISCoS) Prevention Committee. Regression techniques were used to derive regional and global estimates of TSCI incidence. Using the findings of 31 published studies, a regression model was fitted using a known number of TSCI cases as the dependent variable and the population at risk as the single independent variable. In the process of deriving TSCI incidence, an alternative TSCI model was specified in an attempt to arrive at an optimal way of estimating the global incidence of TSCI. The global incidence of TSCI was estimated to be 23 cases per 1,000,000 persons in 2007 (179,312 cases per annum). World Health Organization's regional results are provided. Understanding the incidence of TSCI is important for health service planning and for the determination of injury prevention priorities. In the absence of high-quality epidemiological studies of TSCI in each country, the estimation of TSCI obtained through population modelling can be used to overcome known deficits in global spinal cord injury (SCI) data. The incidence of TSCI is context specific, and an alternative regression model demonstrated how TSCI incidence estimates could be improved with additional data. The results highlight the need for data standardisation and comprehensive reporting of national level TSCI data. A step-wise approach from the collation of conventional epidemiological data through to population modelling is suggested.
NASA Astrophysics Data System (ADS)
Theologou, I.; Patelaki, M.; Karantzalos, K.
2015-04-01
Assessing and monitoring water quality status through timely, cost effective and accurate manner is of fundamental importance for numerous environmental management and policy making purposes. Therefore, there is a current need for validated methodologies which can effectively exploit, in an unsupervised way, the enormous amount of earth observation imaging datasets from various high-resolution satellite multispectral sensors. To this end, many research efforts are based on building concrete relationships and empirical algorithms from concurrent satellite and in-situ data collection campaigns. We have experimented with Landsat 7 and Landsat 8 multi-temporal satellite data, coupled with hyperspectral data from a field spectroradiometer and in-situ ground truth data with several physico-chemical and other key monitoring indicators. All available datasets, covering a 4 years period, in our case study Lake Karla in Greece, were processed and fused under a quantitative evaluation framework. The performed comprehensive analysis posed certain questions regarding the applicability of single empirical models across multi-temporal, multi-sensor datasets towards the accurate prediction of key water quality indicators for shallow inland systems. Single linear regression models didn't establish concrete relations across multi-temporal, multi-sensor observations. Moreover, the shallower parts of the inland system followed, in accordance with the literature, different regression patterns. Landsat 7 and 8 resulted in quite promising results indicating that from the recreation of the lake and onward consistent per-sensor, per-depth prediction models can be successfully established. The highest rates were for chl-a (r2=89.80%), dissolved oxygen (r2=88.53%), conductivity (r2=88.18%), ammonium (r2=87.2%) and pH (r2=86.35%), while the total phosphorus (r2=70.55%) and nitrates (r2=55.50%) resulted in lower correlation rates.
Evaluation strategies for isotope ratio measurements of single particles by LA-MC-ICPMS.
Kappel, S; Boulyga, S F; Dorta, L; Günther, D; Hattendorf, B; Koffler, D; Laaha, G; Leisch, F; Prohaska, T
2013-03-01
Data evaluation is a crucial step when it comes to the determination of accurate and precise isotope ratios computed from transient signals measured by multi-collector-inductively coupled plasma mass spectrometry (MC-ICPMS) coupled to, for example, laser ablation (LA). In the present study, the applicability of different data evaluation strategies (i.e. 'point-by-point', 'integration' and 'linear regression slope' method) for the computation of (235)U/(238)U isotope ratios measured in single particles by LA-MC-ICPMS was investigated. The analyzed uranium oxide particles (i.e. 9073-01-B, CRM U010 and NUSIMEP-7 test samples), having sizes down to the sub-micrometre range, are certified with respect to their (235)U/(238)U isotopic signature, which enabled evaluation of the applied strategies with respect to precision and accuracy. The different strategies were also compared with respect to their expanded uncertainties. Even though the 'point-by-point' method proved to be superior, the other methods are advantageous, as they take weighted signal intensities into account. For the first time, the use of a 'finite mixture model' is presented for the determination of an unknown number of different U isotopic compositions of single particles present on the same planchet. The model uses an algorithm that determines the number of isotopic signatures by attributing individual data points to computed clusters. The (235)U/(238)U isotope ratios are then determined by means of the slopes of linear regressions estimated for each cluster. The model was successfully applied for the accurate determination of different (235)U/(238)U isotope ratios of particles deposited on the NUSIMEP-7 test samples.
A Comparison of Mean Phase Difference and Generalized Least Squares for Analyzing Single-Case Data
ERIC Educational Resources Information Center
Manolov, Rumen; Solanas, Antonio
2013-01-01
The present study focuses on single-case data analysis specifically on two procedures for quantifying differences between baseline and treatment measurements. The first technique tested is based on generalized least square regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The…
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Deep supervised dictionary learning for no-reference image quality assessment
NASA Astrophysics Data System (ADS)
Huang, Yuge; Liu, Xuesong; Tian, Xiang; Zhou, Fan; Chen, Yaowu; Jiang, Rongxin
2018-03-01
We propose a deep convolutional neural network (CNN) for general no-reference image quality assessment (NR-IQA), i.e., accurate prediction of image quality without a reference image. The proposed model consists of three components such as a local feature extractor that is a fully CNN, an encoding module with an inherent dictionary that aggregates local features to output a fixed-length global quality-aware image representation, and a regression module that maps the representation to an image quality score. Our model can be trained in an end-to-end manner, and all of the parameters, including the weights of the convolutional layers, the dictionary, and the regression weights, are simultaneously learned from the loss function. In addition, the model can predict quality scores for input images of arbitrary sizes in a single step. We tested our method on commonly used image quality databases and showed that its performance is comparable with that of state-of-the-art general-purpose NR-IQA algorithms.
Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu
2018-03-13
Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.
Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.
Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva
2018-02-12
Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.
Genetic aspect of Alzheimer disease: Results of complex segregation analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadonvick, A.D.; Lee, I.M.L.; Bailey-Wilson, J.E.
1994-09-01
The study was designed to evaluate the possibility that a single major locus will explain the segregation of Alzheimer disease (AD). The data were from the population-based AD Genetic Database and consisted of 402 consecutive, unrelated probands, diagnosed to have either `probable` or `autopsy confirmed` AD and their 2,245 first-degree relatives. In this analysis, a relative was considered affected with AD only when there were sufficient medical/autopsy data to support diagnosis of AD being the most likely cause of the dementia. Transmission probability models allowing for a genotype-dependent and logistically distributed age-of-onset were used. The program REGTL in the S.A.G.E.more » computer program package was used for a complex segregation analysis. The models included correction for single ascertainment. Regressive familial effects were not estimated. The data were analyzed to test for single major locus (SML), random transmission and no transmission (environmental) hypotheses. The results of the complex segregation analysis showed that (1) the SML was the best fit, and (2) the non-genetic models could be rejected.« less
Dillon, Paul; Phillips, L Alison; Gallagher, Paul; Smith, Susan M; Stewart, Derek; Cousins, Gráinne
2018-02-05
The Necessity-Concerns Framework (NCF) is a multidimensional theory describing the relationship between patients' positive and negative evaluations of their medication which interplay to influence adherence. Most studies evaluating the NCF have failed to account for the multidimensional nature of the theory, placing the separate dimensions of medication "necessity beliefs" and "concerns" onto a single dimension (e.g., the Beliefs about Medicines Questionnaire-difference score model). To assess the multidimensional effect of patient medication beliefs (concerns and necessity beliefs) on medication adherence using polynomial regression with response surface analysis. Community-dwelling older adults >65 years (n = 1,211) presenting their own prescription for antihypertensive medication to 106 community pharmacies in the Republic of Ireland rated their concerns and necessity beliefs to antihypertensive medications at baseline and their adherence to antihypertensive medication at 12 months via structured telephone interview. Confirmatory polynomial regression found the difference-score model to be inaccurate; subsequent exploratory analysis identified a quadratic model to be the best-fitting polynomial model. Adherence was lowest among those with strong medication concerns and weak necessity beliefs, and adherence was greatest for those with weak concerns and strong necessity beliefs (slope β = -0.77, p<.001; curvature β = -0.26, p = .004). However, novel nonreciprocal effects were also observed; patients with simultaneously high concerns and necessity beliefs had lower adherence than those with simultaneously low concerns and necessity beliefs (slope β = -0.36, p = .004; curvature β = -0.25, p = .003). The difference-score model fails to account for the potential nonreciprocal effects. Results extend evidence supporting the use of polynomial regression to assess the multidimensional effect of medication beliefs on adherence.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye
Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael
2017-01-01
Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
Speidel, S E; Peel, R K; Crews, D H; Enns, R M
2016-02-01
Genetic evaluation research designed to reduce the required days to a specified end point has received very little attention in pertinent scientific literature, given that its economic importance was first discussed in 1957. There are many production scenarios in today's beef industry, making a prediction for the required number of days to a single end point a suboptimal option. Random regression is an attractive alternative to calculate days to weight (DTW), days to ultrasound back fat (DTUBF), and days to ultrasound rib eye area (DTUREA) genetic predictions that could overcome weaknesses of a single end point prediction. The objective of this study was to develop random regression approaches for the prediction of the DTW, DTUREA, and DTUBF. Data were obtained from the Agriculture and Agri-Food Canada Research Centre, Lethbridge, AB, Canada. Data consisted of records on 1,324 feedlot cattle spanning 1999 to 2007. Individual animals averaged 5.77 observations with weights, ultrasound rib eye area (UREA), ultrasound back fat depth (UBF), and ages ranging from 293 to 863 kg, 73.39 to 129.54 cm, 1.53 to 30.47 mm, and 276 to 519 d, respectively. Random regression models using Legendre polynomials were used to regress age of the individual on weight, UREA, and UBF. Fixed effects in the model included an overall fixed regression of age on end point (weight, UREA, and UBF) nested within breed to account for the mean relationship between age and weight as well as a contemporary group effect consisting of breed of the animal (Angus, Charolais, and Charolais sired), feedlot pen, and year of measure. Likelihood ratio tests were used to determine the appropriate random polynomial order. Use of the quadratic polynomial did not account for any additional genetic variation in days for DTW ( > 0.11), for DTUREA ( > 0.18), and for DTUBF ( > 0.20) when compared with the linear random polynomial. Heritability estimates from the linear random regression for DTW ranged from 0.54 to 0.74, corresponding to end points of 293 and 863 kg, respectively. Heritability for DTUREA ranged from 0.51 to 0.34 and for DTUBF ranged from 0.55 to 0.37. These estimates correspond to UREA end points of 35 and 125 cm and UBF end points of 1.53 and 30 mm, respectively. This range of heritability shows DTW, DTUREA, and DTUBF to be highly heritable and indicates that selection pressure aimed at reducing the number of days to reach a finish weight end point can result in genetic change given sufficient data.
Air Leakage of US Homes: Regression Analysis and Improvements from Retrofit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Wanyu R.; Joh, Jeffrey; Sherman, Max H.
2012-08-01
LBNL Residential Diagnostics Database (ResDB) contains blower door measurements and other diagnostic test results of homes in United States. Of these, approximately 134,000 single-family detached homes have sufficient information for the analysis of air leakage in relation to a number of housing characteristics. We performed regression analysis to consider the correlation between normalized leakage and a number of explanatory variables: IECC climate zone, floor area, height, year built, foundation type, duct location, and other characteristics. The regression model explains 68% of the observed variability in normalized leakage. ResDB also contains the before and after retrofit air leakage measurements of approximatelymore » 23,000 homes that participated in weatherization assistant programs (WAPs) or residential energy efficiency programs. The two types of programs achieve rather similar reductions in normalized leakage: 30% for WAPs and 20% for other energy programs.« less
Parametric Human Body Reconstruction Based on Sparse Key Points.
Cheng, Ke-Li; Tong, Ruo-Feng; Tang, Min; Qian, Jing-Ye; Sarkis, Michel
2016-11-01
We propose an automatic parametric human body reconstruction algorithm which can efficiently construct a model using a single Kinect sensor. A user needs to stand still in front of the sensor for a couple of seconds to measure the range data. The user's body shape and pose will then be automatically constructed in several seconds. Traditional methods optimize dense correspondences between range data and meshes. In contrast, our proposed scheme relies on sparse key points for the reconstruction. It employs regression to find the corresponding key points between the scanned range data and some annotated training data. We design two kinds of feature descriptors as well as corresponding regression stages to make the regression robust and accurate. Our scheme follows with dense refinement where a pre-factorization method is applied to improve the computational efficiency. Compared with other methods, our scheme achieves similar reconstruction accuracy but significantly reduces runtime.
Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles
2012-01-01
Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072
Dual regression physiological modeling of resting-state EPI power spectra: Effects of healthy aging.
Viessmann, Olivia; Möller, Harald E; Jezzard, Peter
2018-02-02
Aging and disease-related changes in the arteriovasculature have been linked to elevated levels of cardiac cycle-induced pulsatility in the cerebral microcirculation. Functional magnetic resonance imaging (fMRI), acquired fast enough to unalias the cardiac frequency contributions, can be used to study these physiological signals in the brain. Here, we propose an iterative dual regression analysis in the frequency domain to model single voxel power spectra of echo planar imaging (EPI) data using external recordings of the cardiac and respiratory cycles as input. We further show that a data-driven variant, without external physiological traces, produces comparable results. We use this framework to map and quantify cardiac and respiratory contributions in healthy aging. We found a significant increase in the spatial extent of cardiac modulated white matter voxels with age, whereas the overall strength of cardiac-related EPI power did not show an age effect. Copyright © 2018. Published by Elsevier Inc.
Ali, Selman A; Lynam, June; McLean, Cornelia S; Entwisle, Claire; Loudon, Peter; Rojas, José M; McArdle, Stephanie E B; Li, Geng; Mian, Shahid; Rees, Robert C
2002-04-01
Direct intratumor injection of a disabled infectious single cycle HSV-2 virus encoding the murine GM-CSF gene (DISC/mGM-CSF) into established murine colon carcinoma CT26 tumors induced a significant delay in tumor growth and complete tumor regression in up to 70% of animals. Pre-existing immunity to HSV did not reduce the therapeutic efficacy of DISC/mGM-CSF, and, when administered in combination with syngeneic dendritic cells, further decreased tumor growth and increased the incidence of complete tumor regression. Direct intratumor injection of DISC/mGM-CSF also inhibited the growth of CT26 tumor cells implanted on the contralateral flank or seeded into the lungs following i.v. injection of tumor cells (experimental lung metastasis). Proliferation of splenocytes in response to Con A was impaired in progressor and tumor-bearer, but not regressor, mice. A potent tumor-specific CTL response was generated from splenocytes of all mice with regressing, but not progressing tumors following in vitro peptide stimulation; this response was specific for the gp70 AH-1 peptide SPSYVYHQF and correlated with IFN-gamma, but not IL-4 cytokine production. Depletion of CD8(+) T cells from regressor splenocytes before in vitro stimulation with the relevant peptide abolished their cytolytic activity, while depletion of CD4(+) T cells only partially inhibited CTL generation. Tumor regression induced by DISC/mGM-CSF virus immunotherapy provides a unique model for evaluating the immune mechanism(s) involved in tumor rejection, upon which tumor immunotherapy regimes may be based.
Ho, Sean Wei Loong; Tan, Teong Jin Lester; Lee, Keng Thiam
2016-03-01
To evaluate whether pre-operative anthropometric data can predict the optimal diameter and length of hamstring tendon autograft for anterior cruciate ligament (ACL) reconstruction. This was a cohort study that involved 169 patients who underwent single-bundle ACL reconstruction (single surgeon) with 4-stranded MM Gracilis and MM Semi-Tendinosus autografts. Height, weight, body mass index (BMI), gender, race, age and -smoking status were recorded pre-operatively. Intra-operatively, the diameter and functional length of the 4-stranded autograft was recorded. Multiple regression analysis was used to determine the relationship between the anthropometric measurements and the length and diameter of the implanted autografts. The strongest correlation between 4-stranded hamstring autograft diameter was height and weight. This correlation was stronger in females than males. BMI had a moderate correlation with the diameter of the graft in females. Females had a significantly smaller graft both in diameter and length when compared with males. Linear regression models did not show any significant correlation between hamstring autograft length with height and weight (p>0.05). Simple regression analysis demonstrated that height and weight can be used to predict hamstring graft diameter. The following regression equation was obtained for females: Graft diameter=0.012+0.034*Height+0.026*Weight (R2=0.358, p=0.004) The following regression equation was obtained for males: Graft diameter=5.130+0.012*Height+0.007*Weight (R2=0.086, p=0.002). Pre-operative anthropometric data has a positive correlation with the diameter of 4 stranded hamstring autografts but no significant correlation with the length. This data can be utilised to predict the autograft diameter and may be useful for pre-operative planning and patient counseling for graft selection.
Asano, Junichi; Hirakawa, Akihiro
2017-01-01
The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Ben Hassen, Manel; Bartholomé, Jérôme; Valè, Giampiero; Cao, Tuong-Vi; Ahmadi, Nourollah
2018-05-09
Developing rice varieties adapted to alternate wetting and drying water management is crucial for the sustainability of irrigated rice cropping systems. Here we report the first study exploring the feasibility of breeding rice for adaptation to alternate wetting and drying using genomic prediction methods that account for genotype by environment interactions. Two breeding populations (a reference panel of 284 accessions and a progeny population of 97 advanced lines) were evaluated under alternate wetting and drying and continuous flooding management systems. The predictive ability of genomic prediction for response variables (index of relative performance and the slope of the joint regression) and for multi-environment genomic prediction models were compared. For the three traits considered (days to flowering, panicle weight and nitrogen-balance index), significant genotype by environment interactions were observed in both populations. In cross validation, predictive ability for the index was on average lower (0.31) than that of the slope of the joint regression (0.64) whatever the trait considered. Similar results were found for progeny validation. Both cross-validation and progeny validation experiments showed that the performance of multi-environment models predicting unobserved phenotypes of untested entrees was similar to the performance of single environment models with differences in predictive ability ranging from -6% to 4% depending on the trait and on the statistical model concerned. The predictive ability of multi-environment models predicting unobserved phenotypes of entrees evaluated under both water management systems outperformed single environment models by an average of 30%. Practical implications for breeding rice for adaptation to alternate wetting and drying system are discussed. Copyright © 2018, G3: Genes, Genomes, Genetics.
An Ionospheric Index Model based on Linear Regression and Neural Network Approaches
NASA Astrophysics Data System (ADS)
Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John
2017-04-01
The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Glass-Kaastra, Shiona K; Pearl, David L; Reid-Smith, Richard J; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A
2014-10-01
Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim-sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the "What's strange about recent events?" (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders.
Glass-Kaastra, Shiona K.; Pearl, David L.; Reid-Smith, Richard J.; McEwen, Beverly; Slavic, Durda; Fairles, Jim; McEwen, Scott A.
2014-01-01
Susceptibility results for Pasteurella multocida and Streptococcus suis isolated from swine clinical samples were obtained from January 1998 to October 2010 from the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, and used to describe variation in antimicrobial resistance (AMR) to 4 drugs of importance in the Ontario swine industry: ampicillin, tetracycline, tiamulin, and trimethoprim–sulfamethoxazole. Four temporal data-analysis options were used: visualization of trends in 12-month rolling averages, logistic-regression modeling, temporal-scan statistics, and a scan with the “What’s strange about recent events?” (WSARE) algorithm. The AMR trends varied among the antimicrobial drugs for a single pathogen and between pathogens for a single antimicrobial, suggesting that pathogen-specific AMR surveillance may be preferable to indicator data. The 4 methods provided complementary and, at times, redundant results. The most appropriate combination of analysis methods for surveillance using these data included temporal-scan statistics with a visualization method (rolling-average or predicted-probability plots following logistic-regression models). The WSARE algorithm provided interesting results for quality control and has the potential to detect new resistance patterns; however, missing data created problems for displaying the results in a way that would be meaningful to all surveillance stakeholders. PMID:25355992
Use of partial least squares regression to impute SNP genotypes in Italian cattle breeds.
Dimauro, Corrado; Cellesi, Massimo; Gaspa, Giustino; Ajmone-Marsan, Paolo; Steri, Roberto; Marras, Gabriele; Macciotta, Nicolò P P
2013-06-05
The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.
Dron, Julien; Dodi, Alain
2011-06-15
The removal of chloride, nitrate and sulfate ions from aqueous solutions by a macroporous resin is studied through the ion exchange systems OH(-)/Cl(-), OH(-)/NO(3)(-), OH(-)/SO(4)(2-), and HCO(3)(-)/Cl(-), Cl(-)/NO(3)(-), Cl(-)/SO(4)(2-). They are investigated by means of Langmuir, Freundlich, Dubinin-Radushkevitch (D-R) and Dubinin-Astakhov (D-A) single-component adsorption isotherms. The sorption parameters and the fitting of the models are determined by nonlinear regression and discussed. The Langmuir model provides a fair estimation of the sorption capacity whatever the system under study, on the contrary to Freundlich and D-R models. The adsorption energies deduced from Dubinin and Langmuir isotherms are in good agreement, and the surface parameter of the D-A isotherm appears consistent. All models agree on the order of affinity OH(-)
Zimmerman, Tammy M.
2006-01-01
The Lake Erie shoreline in Pennsylvania spans nearly 40 miles and is a valuable recreational resource for Erie County. Nearly 7 miles of the Lake Erie shoreline lies within Presque Isle State Park in Erie, Pa. Concentrations of Escherichia coli (E. coli) bacteria at permitted Presque Isle beaches occasionally exceed the single-sample bathing-water standard, resulting in unsafe swimming conditions and closure of the beaches. E. coli concentrations and other water-quality and environmental data collected at Presque Isle Beach 2 during the 2004 and 2005 recreational seasons were used to develop models using tobit regression analyses to predict E. coli concentrations. All variables statistically related to E. coli concentrations were included in the initial regression analyses, and after several iterations, only those explanatory variables that made the models significantly better at predicting E. coli concentrations were included in the final models. Regression models were developed using data from 2004, 2005, and the combined 2-year dataset. Variables in the 2004 model and the combined 2004-2005 model were log10 turbidity, rain weight, wave height (calculated), and wind direction. Variables in the 2005 model were log10 turbidity and wind direction. Explanatory variables not included in the final models were water temperature, streamflow, wind speed, and current speed; model results indicated these variables did not meet significance criteria at the 95-percent confidence level (probabilities were greater than 0.05). The predicted E. coli concentrations produced by the models were used to develop probabilities that concentrations would exceed the single-sample bathing-water standard for E. coli of 235 colonies per 100 milliliters. Analysis of the exceedence probabilities helped determine a threshold probability for each model, chosen such that the correct number of exceedences and nonexceedences was maximized and the number of false positives and false negatives was minimized. Future samples with computed exceedence probabilities higher than the selected threshold probability, as determined by the model, will likely exceed the E. coli standard and a beach advisory or closing may need to be issued; computed exceedence probabilities lower than the threshold probability will likely indicate the standard will not be exceeded. Additional data collected each year can be used to test and possibly improve the model. This study will aid beach managers in more rapidly determining when waters are not safe for recreational use and, subsequently, when to issue beach advisories or closings.
Non-homogeneous hybrid rocket fuel for enhanced regression rates utilizing partial entrainment
NASA Astrophysics Data System (ADS)
Boronowsky, Kenny
A concept was developed and tested to enhance the performance and regression rate of hydroxyl terminated polybutadiene (HTPB), a commonly used hybrid rocket fuel. By adding small nodules of paraffin into the HTPB fuel, a non-homogeneous mixture was created resulting in increased regression rates. The goal was to develop a fuel with a simplified single core geometry and a tailorable regression rate. The new fuel would benefit from the structural stability of HTPB yet not suffer from the large void fraction representative of typical HTPB core geometries. Regression rates were compared between traditional HTPB single core grains, 85% HTPB mixed with 15% (by weight) paraffin cores, 70% HTPB mixed with 30% paraffin cores, and plain paraffin single core grains. Each fuel combination was tested at oxidizer flow rates, ranging from 0.9 - 3.3 g/s of gaseous oxygen, in a small scale hybrid test rocket and average regression rates were measured. While large uncertainties were present in the experimental setup, the overall data showed that the regression rate was enhanced as paraffin concentration increased. While further testing would be required at larger scales of interest, the trends are encouraging. Inclusion of paraffin nodules in the HTPB grain may produce a greater advantage than other more noxious additives in current use. In addition, it may lead to safer rocket motors with higher integrated thrust due to the decreased void fraction.
Modeling non-linear growth responses to temperature and hydrology in wetland trees
NASA Astrophysics Data System (ADS)
Keim, R.; Allen, S. T.
2016-12-01
Growth responses of wetland trees to flooding and climate variations are difficult to model because they depend on multiple, apparently interacting factors, but are a critical link in hydrological control of wetland carbon budgets. To more generally understand tree growth to hydrological forcing, we modeled non-linear responses of tree ring growth to flooding and climate at sub-annual time steps, using Vaganov-Shashkin response functions. We calibrated the model to six baldcypress tree-ring chronologies from two hydrologically distinct sites in southern Louisiana, and tested several hypotheses of plasticity in wetlands tree responses to interacting environmental variables. The model outperformed traditional multiple linear regression. More importantly, optimized response parameters were generally similar among sites with varying hydrological conditions, suggesting generality to the functions. Model forms that included interacting responses to multiple forcing factors were more effective than were single response functions, indicating the principle of a single limiting factor is not correct in wetlands and both climatic and hydrological variables must be considered in predicting responses to hydrological or climate change.
Validation of Single-Item Screening Measures for Provider Burnout in a Rural Health Care Network.
Waddimba, Anthony C; Scribani, Melissa; Nieves, Melinda A; Krupa, Nicole; May, John J; Jenkins, Paul
2016-06-01
We validated three single-item measures for emotional exhaustion (EE) and depersonalization (DP) among rural physician/nonphysician practitioners. We linked cross-sectional survey data (on provider demographics, satisfaction, resilience, and burnout) with administrative information from an integrated health care network (1 academic medical center, 6 community hospitals, 31 clinics, and 19 school-based health centers) in an eight-county underserved area of upstate New York. In total, 308 physicians and advanced-practice clinicians completed a self-administered, multi-instrument questionnaire (65.1% response rate). Significant proportions of respondents reported high EE (36.1%) and DP (9.9%). In multivariable linear mixed models, scores on EE/DP subscales of the Maslach Burnout Inventory were regressed on each single-item measure. The Physician Work-Life Study's single-item measure (classifying 32.8% of respondents as burning out/completely burned out) was correlated with EE and DP (Spearman's ρ = .72 and .41, p < .0001; Kruskal-Wallis χ(2) = 149.9 and 56.5, p < .0001, respectively). In multivariable models, it predicted high EE (but neither low EE nor low/high DP). EE/DP single items were correlated with parent subscales (Spearman's ρ = .89 and .81, p < .0001; Kruskal-Wallis χ(2) = 230.98 and 197.84, p < .0001, respectively). In multivariable models, the EE item predicted high/low EE, whereas the DP item predicted only low DP. Therefore, the three single-item measures tested varied in effectiveness as screeners for EE/DP dimensions of burnout. © The Author(s) 2015.
Depression and quality of life for women in single-parent and nuclear families.
Landero Hernández, René; Estrada Aranda, Benito; González Ramírez, Mónica Teresa
2009-05-01
This is a cross-sectional study which objectives are 1) to determine the predictors for perceived quality of life and 2) to analyze the differences between women from single-parent families and bi-parent families, about their quality of life, depression and familiar income. We worked with a non-probabilistic sample of 140 women from Monterrey, N.L, Mexico, 107 are from bi-parent families and 33 from single parent families. Some of the results show that women from single-parent families have lower quality of life (Z = -2.224, p = .026), lower income (Z = -2.727, p = .006) and greater depression (Z = -6.143, p = .001) than women from bi-parental families. The perceived quality of life's predictors, using a multiple regression model (n = 140) were depression, income and number of children, those variables explaining 25.4% of variance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Matthew D.; Schultheiss, Timothy E., E-mail: schultheiss@coh.org; Smith, David D.
Purpose/Objective(s): To perform a meta-regression on published data and to model the 5-year probability of cataract development after hematopoietic stem cell transplantation (HSCT) with and without total body irradiation (TBI). Methods and Materials: Eligible studies reporting cataract incidence after HSCT with TBI were identified by a PubMed search. Seventeen publications provided complete information on radiation dose schedule, fractionation, dose rate, and actuarial cataract incidence. Chemotherapy-only regimens were included as zero radiation dose regimens. Multivariate meta-regression with a weighted generalized linear model was used to model the 5-year cataract incidence and contributory factors. Results: Data from 1386 patients in 21 seriesmore » were included for analysis. TBI was administered to a total dose of 0 to 15.75 Gy with single or fractionated schedules with a dose rate of 0.04 to 0.16 Gy/min. Factors significantly associated with 5-year cataract incidence were dose, dose times dose per fraction (D•dpf), pediatric versus adult status, and the absence of an ophthalmologist as an author. Dose rate, graft versus host disease, steroid use, hyperfractionation, and number of fractions were not significant. Five-fold internal cross-validation showed a model validity of 83% ± 8%. Regression diagnostics showed no evidence of lack-of-fit and no patterns in the studentized residuals. The α/β ratio from the linear quadratic model, estimated as the ratio of the coefficients for dose and D•dpf, was 0.76 Gy (95% confidence interval [CI], 0.05-1.55). The odds ratio for pediatric patients was 2.8 (95% CI, 1.7-4.6) relative to adults. Conclusions: Dose, D•dpf, pediatric status, and regimented follow-up care by an ophthalmologist were predictive of 5-year cataract incidence after HSCT. The low α/β ratio indicates the importance of fractionation in reducing cataracts. Dose rate effects have been observed in single institution studies but not in the combined data analyzed here. Although data were limited to articles with 5-year actuarial estimates, the development of radiation-induced cataracts extends beyond this time.« less
Combined Effects of Prenatal Exposures to Environmental Chemicals on Birth Weight.
Govarts, Eva; Remy, Sylvie; Bruckers, Liesbeth; Den Hond, Elly; Sioen, Isabelle; Nelen, Vera; Baeyens, Willy; Nawrot, Tim S; Loots, Ilse; Van Larebeke, Nick; Schoeters, Greet
2016-05-12
Prenatal chemical exposure has been frequently associated with reduced fetal growth by single pollutant regression models although inconsistent results have been obtained. Our study estimated the effects of exposure to single pollutants and mixtures on birth weight in 248 mother-child pairs. Arsenic, copper, lead, manganese and thallium were measured in cord blood, cadmium in maternal blood, methylmercury in maternal hair, and five organochlorines, two perfluorinated compounds and diethylhexyl phthalate metabolites in cord plasma. Daily exposure to particulate matter was modeled and averaged over the duration of gestation. In single pollutant models, arsenic was significantly associated with reduced birth weight. The effect estimate increased when including cadmium, and mono-(2-ethyl-5-carboxypentyl) phthalate (MECPP) co-exposure. Combining exposures by principal component analysis generated an exposure factor loaded by cadmium and arsenic that was associated with reduced birth weight. MECPP induced gender specific effects. In girls, the effect estimate was doubled with co-exposure of thallium, PFOS, lead, cadmium, manganese, and mercury, while in boys, the mixture of MECPP with cadmium showed the strongest association with birth weight. In conclusion, birth weight was consistently inversely associated with exposure to pollutant mixtures. Chemicals not showing significant associations at single pollutant level contributed to stronger effects when analyzed as mixtures.
Huang, An-Min; Fei, Ben-Hua; Jiang, Ze-Hui; Hse, Chung-Yun
2007-09-01
Near infrared spectroscopy is widely used as a quantitative method, and the main multivariate techniques consist of regression methods used to build prediction models, however, the accuracy of analysis results will be affected by many factors. In the present paper, the influence of different sample roughness on the mathematical model of NIR quantitative analysis of wood density was studied. The result of experiments showed that if the roughness of predicted samples was consistent with that of calibrated samples, the result was good, otherwise the error would be much higher. The roughness-mixed model was more flexible and adaptable to different sample roughness. The prediction ability of the roughness-mixed model was much better than that of the single-roughness model.
Jung, Seung-Hyun; Cho, Sung-Min; Yim, Seon-Hee; Kim, So-Hee; Park, Hyeon-Chun; Cho, Mi-La; Shim, Seung-Cheol; Kim, Tae-Hwan; Park, Sung-Hwan; Chung, Yeun-Jun
2016-12-01
To develop a genotype-based ankylosing spondylitis (AS) risk prediction model that is more sensitive and specific than HLA-B27 typing. To develop the AS genetic risk scoring (AS-GRS) model, 648 individuals (285 cases and 363 controls) were examined for 5 copy number variants (CNV), 7 single-nucleotide polymorphisms (SNP), and an HLA-B27 marker by TaqMan assays. The AS-GRS model was developed using logistic regression and validated with a larger independent set (576 cases and 680 controls). Through logistic regression, we built the AS-GRS model consisting of 5 genetic components: HLA-B27, 3 CNV (1q32.2, 13q13.1, and 16p13.3), and 1 SNP (rs10865331). All significant associations of genetic factors in the model were replicated in the independent validation set. The discriminative ability of the AS-GRS model measured by the area under the curve was excellent: 0.976 (95% CI 0.96-0.99) in the model construction set and 0.951 (95% CI 0.94-0.96) in the validation set. The AS-GRS model showed higher specificity and accuracy than the HLA-B27-only model when the sensitivity was set to over 94%. When we categorized the individuals into quartiles based on the AS-GRS scores, OR of the 4 groups (low, intermediate-1, intermediate-2, and high risk) showed an increasing trend with the AS-GRS scores (r 2 = 0.950) and the highest risk group showed a 494× higher risk of AS than the lowest risk group (95% CI 237.3-1029.1). Our AS-GRS could be used to identify individuals at high risk for AS before major symptoms appear, which may improve the prognosis for them through early treatment.
Genomic Selection in Multi-environment Crop Trials.
Oakey, Helena; Cullis, Brian; Thompson, Robin; Comadran, Jordi; Halpin, Claire; Waugh, Robbie
2016-05-03
Genomic selection in crop breeding introduces modeling challenges not found in animal studies. These include the need to accommodate replicate plants for each line, consider spatial variation in field trials, address line by environment interactions, and capture nonadditive effects. Here, we propose a flexible single-stage genomic selection approach that resolves these issues. Our linear mixed model incorporates spatial variation through environment-specific terms, and also randomization-based design terms. It considers marker, and marker by environment interactions using ridge regression best linear unbiased prediction to extend genomic selection to multiple environments. Since the approach uses the raw data from line replicates, the line genetic variation is partitioned into marker and nonmarker residual genetic variation (i.e., additive and nonadditive effects). This results in a more precise estimate of marker genetic effects. Using barley height data from trials, in 2 different years, of up to 477 cultivars, we demonstrate that our new genomic selection model improves predictions compared to current models. Analyzing single trials revealed improvements in predictive ability of up to 5.7%. For the multiple environment trial (MET) model, combining both year trials improved predictive ability up to 11.4% compared to a single environment analysis. Benefits were significant even when fewer markers were used. Compared to a single-year standard model run with 3490 markers, our partitioned MET model achieved the same predictive ability using between 500 and 1000 markers depending on the trial. Our approach can be used to increase accuracy and confidence in the selection of the best lines for breeding and/or, to reduce costs by using fewer markers. Copyright © 2016 Oakey et al.
Hwang, Bosun; Han, Jonghee; Choi, Jong Min; Park, Kwang Suk
2008-11-01
The purpose of this study was to develop an unobtrusive energy expenditure (EE) measurement system using an infrared (IR) sensor-based activity monitoring system to measure indoor activities and to estimate individual quantitative EE. IR-sensor activation counts were measured with a Bluetooth-based monitoring system and the standard EE was calculated using an established regression equation. Ten male subjects participated in the experiment and three different EE measurement systems (gas analyzer, accelerometer, IR sensor) were used simultaneously in order to determine the regression equation and evaluate the performance. As a standard measurement, oxygen consumption was simultaneously measured by a portable metabolic system (Metamax 3X, Cortex, Germany). A single room experiment was performed to develop a regression model of the standard EE measurement from the proposed IR sensor-based measurement system. In addition, correlation and regression analyses were done to compare the performance of the IR system with that of the Actigraph system. We determined that our proposed IR-based EE measurement system shows a similar correlation to the Actigraph system with the standard measurement system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brink, A.; Kilpinen, P.; Hupa, M.
1996-01-01
Two methods to improve the modeling of NO{sub x} emissions in numerical flow simulation of combustion are investigated. The models used are a reduced mechanism for nitrogen chemistry in methane combustion and a new model based on regression analysis of perfectly stirred reactor simulations using detailed comprehensive reaction kinetics. The applicability of the methods to numerical flow simulation of practical furnaces, especially in the near burner region, is tested against experimental data from a pulverized coal fired single burner furnace. The results are also compared to those obtained using a commonly used description for the overall reaction rate of NO.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three
NASA Astrophysics Data System (ADS)
Steinhardt, Charles L.; Jermyn, Adam S.
2018-02-01
Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.
Linden, Ariel; Adams, John L
2011-12-01
Often, when conducting programme evaluations or studying the effects of policy changes, researchers may only have access to aggregated time series data, presented as observations spanning both the pre- and post-intervention periods. The most basic analytic model using these data requires only a single group and models the intervention effect using repeated measurements of the dependent variable. This model controls for regression to the mean and is likely to detect a treatment effect if it is sufficiently large. However, many potential sources of bias still remain. Adding one or more control groups to this model could strengthen causal inference if the groups are comparable on pre-intervention covariates and level and trend of the dependent variable. If this condition is not met, the validity of the study findings could be called into question. In this paper we describe a propensity score-based weighted regression model, which overcomes these limitations by weighting the control groups to represent the average outcome that the treatment group would have exhibited in the absence of the intervention. We illustrate this technique studying cigarette sales in California before and after the passage of Proposition 99 in California in 1989. While our results were similar to those of the Synthetic Control method, the weighting approach has the advantage of being technically less complicated, rooted in regression techniques familiar to most researchers, easy to implement using any basic statistical software, may accommodate any number of treatment units, and allows for greater flexibility in the choice of treatment effect estimators. © 2010 Blackwell Publishing Ltd.
Assessing the effect of different treatments on decomposition rate of dairy manure.
Khalil, Tariq M; Higgins, Stewart S; Ndegwa, Pius M; Frear, Craig S; Stöckle, Claudio O
2016-11-01
Confined animal feeding operations (CAFOs) contribute to greenhouse gas emission, but the magnitude of these emissions as a function of operation size, infrastructure, and manure management are difficult to assess. Modeling is a viable option to estimate gaseous emission and nutrient flows from CAFOs. These models use a decomposition rate constant for carbon mineralization. However, this constant is usually determined assuming a homogenous mix of manure, ignoring the effects of emerging manure treatments. The aim of this study was to measure and compare the decomposition rate constants of dairy manure in single and three-pool decomposition models, and to develop an empirical model based on chemical composition of manure for prediction of a decomposition rate constant. Decomposition rate constants of manure before and after an anaerobic digester (AD), following coarse fiber separation, and fine solids removal were determined under anaerobic conditions for single and three-pool decomposition models. The decomposition rates of treated manure effluents differed significantly from untreated manure for both single and three-pool decomposition models. In the single-pool decomposition model, AD effluent containing only suspended solids had a relatively high decomposition rate of 0.060 d(-1), while liquid with coarse fiber and fine solids removed had the lowest rate of 0.013 d(-1). In the three-pool decomposition model, fast and slow decomposition rate constants (0.25 d(-1) and 0.016 d(-1) respectively) of untreated AD influent were also significantly different from treated manure fractions. A regression model to predict the decomposition rate of treated dairy manure fitted well (R(2) = 0.83) to observed data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Addressing data privacy in matched studies via virtual pooling.
Saha-Chaudhuri, P; Weinberg, C R
2017-09-07
Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
NASA Astrophysics Data System (ADS)
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-01
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R2 and pseudo R2 were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R2 ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R2 = 0.31), but there was still large variability between patients in R2. The R2 from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-07
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R(2) and pseudo R(2) were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R(2) ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R(2) = 0.31), but there was still large variability between patients in R(2). The R(2) from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
Reulen, Holger; Kneib, Thomas
2016-04-01
One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
Family structure and the treatment of childhood asthma.
Chen, Alex Y; Escarce, José J
2008-02-01
Family structure is known to influence children's behavioral, educational, and cognitive outcomes, and recent studies suggest that family structure affects children's access to health care as well. However, no study has addressed whether family structure is associated with the care children receive for particular conditions or with their physical health outcomes. To assess the effects of family structure on the treatment and outcomes of children with asthma. Our data sources were the 1996-2003 Medical Expenditure Panel Survey (MEPS) and the 2003 National Survey of Children's Health (NSCH). The study samples consisted of children 2-17 years of age with asthma who lived in single-mother or 2-parent families. We assessed the effect of number of parents and number of other children in the household on office visits for asthma and use of asthma medications using negative binomial regression, and we assessed the effect of family structure on the severity of asthma symptoms using binary and ordinal logistic regression. Our regression models adjusted for sociodemographic characteristics, parental experience in child-rearing and in caring for an asthmatic child and, when appropriate, measures of children's health status. Asthmatic children in single-mother families had fewer office visits for asthma and filled fewer prescriptions for controller medications than children with 2 parents. In addition, children living in families with 3 or more other children had fewer office visits and filled fewer prescriptions for reliever and controller medications than children living with no other children. Children from single-mother families had more health difficulties from asthma than children with 2 parents, and children living with 2 or more other children were more likely to have an asthma attack in the past 12 months than children living with no other children. For children with asthma, living with a single mother and the presence of additional children in the household are associated with less treatment for asthma and worse asthma outcomes.
Development of a mobbing short scale in the Gutenberg Health Study.
Garthus-Niegel, Susan; Nübling, Matthias; Letzel, Stephan; Hegewald, Janice; Wagner, Mandy; Wild, Philipp S; Blettner, Maria; Zwiener, Isabella; Latza, Ute; Jankowiak, Sylvia; Liebers, Falk; Seidler, Andreas
2016-01-01
Despite its highly detrimental potential, most standard questionnaires assessing psychosocial stress at work do not include mobbing as a risk factor. In the German standard version of COPSOQ, mobbing is assessed with a single item. In the Gutenberg Health Study, this version was used together with a newly developed short scale based on the Leymann Inventory of Psychological Terror. The purpose of the present study was to evaluate the psychometric properties of these two measures, to compare them and to test their differential impact on relevant outcome parameters. This analysis is based on a population-based sample of 1441 employees participating in the Gutenberg Health Study. Exploratory and confirmatory factor analyses and reliability analyses were used to assess the mobbing scale. To determine their predictive validities, multiple linear regression analyses with six outcome parameters and log-binomial regression models for two of the outcome aspects were run. Factor analyses of the five-item scale confirmed a one-factor solution, reliability was α = 0.65. Both the single-item and the five-item scales were associated with all six outcome scales. Effect sizes were similar for both mobbing measures. Mobbing is an important risk factor for health-related outcomes. For the purpose of psychosocial risk assessment in the workplace, both the single-item and the five-item constructs were psychometrically appropriate. Associations with outcomes were about equivalent. However, the single item has the advantage of parsimony, whereas the five-item construct depicts several distinct forms of mobbing.
Rexrode, Kathryn M; Ridker, Paul M; Hegener, Hillary H; Buring, Julie E; Manson, JoAnn E; Zee, Robert Y L
2008-05-01
Androgen receptors (AR) are expressed in endothelial cells and vascular smooth-muscle cells. Some studies suggest an association between AR gene variation and risk of cardiovascular disease (CVD) in men; however, the relationship has not been examined in women. Six haplotype block-tagging single nucleotide polymorphisms (rs962458, rs6152, rs1204038, rs2361634, rs1337080, rs1337082), as well as the cysteine, adenine, guanine (CAG) microsatellite in exon 1, of the AR gene were evaluated among 300 white postmenopausal women who developed CVD (158 myocardial infarctions and 142 ischemic strokes) and an equal number of matched controls within the Women's Health Study. Genotype distributions were similar between cases and controls, and genotypes were not significantly related to risk of CVD, myocardial infarctions or ischemic stroke in conditional logistic regression models. Seven common haplotypes were observed, but distributions did not differ between cases and controls nor were significant associations observed in logistic regression analysis. The median CAG repeat length was 21. In conditional logistic regression, there was no association between the number of alleles with CAG repeat length >or=21 (or >or=22) and risk of CVD, myocardial infarctions or ischemic stroke. No association between AR genetic variation, as measured by haplotype-tagging single nucleotide polymorphisms and CAG repeat number, and risk of CVD was observed in women.
NASA Astrophysics Data System (ADS)
Pandremmenou, K.; Tziortziotis, N.; Paluri, S.; Zhang, W.; Blekas, K.; Kondi, L. P.; Kumar, S.
2015-03-01
We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an e ort to improve LASSO's prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features. les that typically contain high-entropy data, producing a footprint that is far less conspicuous than existing methods. The system uses a local web server to provide a le system, user interface and applications through an web architecture.
Photonic single nonlinear-delay dynamical node for information processing
NASA Astrophysics Data System (ADS)
Ortín, Silvia; San-Martín, Daniel; Pesquera, Luis; Gutiérrez, José Manuel
2012-06-01
An electro-optical system with a delay loop based on semiconductor lasers is investigated for information processing by performing numerical simulations. This system can replace a complex network of many nonlinear elements for the implementation of Reservoir Computing. We show that a single nonlinear-delay dynamical system has the basic properties to perform as reservoir: short-term memory and separation property. The computing performance of this system is evaluated for two prediction tasks: Lorenz chaotic time series and nonlinear auto-regressive moving average (NARMA) model. We sweep the parameters of the system to find the best performance. The results achieved for the Lorenz and the NARMA-10 tasks are comparable to those obtained by other machine learning methods.
Poverty and Material Hardship in Grandparent-Headed Households.
Baker, Lindsey A; Mutchler, Jan E
2010-08-01
Using the 2001 Survey of Income and Program Participation, the current study examines poverty and material hardship among children living in 3-generation (n = 486), skipped-generation (n = 238), single-parent (n = 2,076), and 2-parent (n = 6,061) households. Multinomial and logistic regression models indicated that children living in grandparent-headed households experience elevated risk of health insecurity (as measured by receipt of public insurance and uninsurance)-a disproportionate risk given rates of poverty within those households. Children living with single parents did not share this substantial risk. Risk of food and housing insecurity did not differ significantly from 2-parent households once characteristics of the household and caregivers were taken into account.
Random regression analyses using B-spline functions to model growth of Nellore cattle.
Boligon, A A; Mercadante, M E Z; Lôbo, R B; Baldi, F; Albuquerque, L G
2012-02-01
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Mumford, Jeanette A.
2017-01-01
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
A kernel regression approach to gene-gene interaction detection for case-control studies.
Larson, Nicholas B; Schaid, Daniel J
2013-11-01
Gene-gene interactions are increasingly being addressed as a potentially important contributor to the variability of complex traits. Consequently, attentions have moved beyond single locus analysis of association to more complex genetic models. Although several single-marker approaches toward interaction analysis have been developed, such methods suffer from very high testing dimensionality and do not take advantage of existing information, notably the definition of genes as functional units. Here, we propose a comprehensive family of gene-level score tests for identifying genetic elements of disease risk, in particular pairwise gene-gene interactions. Using kernel machine methods, we devise score-based variance component tests under a generalized linear mixed model framework. We conducted simulations based upon coalescent genetic models to evaluate the performance of our approach under a variety of disease models. These simulations indicate that our methods are generally higher powered than alternative gene-level approaches and at worst competitive with exhaustive SNP-level (where SNP is single-nucleotide polymorphism) analyses. Furthermore, we observe that simulated epistatic effects resulted in significant marginal testing results for the involved genes regardless of whether or not true main effects were present. We detail the benefits of our methods and discuss potential genome-wide analysis strategies for gene-gene interaction analysis in a case-control study design. © 2013 WILEY PERIODICALS, INC.
Zoellner, Jamie M; Porter, Kathleen J; Chen, Yvonnes; Hedrick, Valisa E; You, Wen; Hickman, Maja; Estabrooks, Paul A
2017-05-01
Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13-20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6-38%) and behaviour (average 30%, range 6-55%) were significant. Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases.
Genomewide predictions from maize single-cross data.
Massman, Jon M; Gordillo, Andres; Lorenzana, Robenzon E; Bernardo, Rex
2013-01-01
Maize (Zea mays L.) breeders evaluate many single-cross hybrids each year in multiple environments. Our objective was to determine the usefulness of genomewide predictions, based on marker effects from maize single-cross data, for identifying the best untested single crosses and the best inbreds within a biparental cross. We considered 479 experimental maize single crosses between 59 Iowa Stiff Stalk Synthetic (BSSS) inbreds and 44 non-BSSS inbreds. The single crosses were evaluated in multilocation experiments from 2001 to 2009 and the BSSS and non-BSSS inbreds had genotypic data for 669 single nucleotide polymorphism (SNP) markers. Single-cross performance was predicted by a previous best linear unbiased prediction (BLUP) approach that utilized marker-based relatedness and information on relatives, and from genomewide marker effects calculated by ridge-regression BLUP (RR-BLUP). With BLUP, the mean prediction accuracy (r(MG)) of single-cross performance was 0.87 for grain yield, 0.90 for grain moisture, 0.69 for stalk lodging, and 0.84 for root lodging. The BLUP and RR-BLUP models did not lead to r(MG) values that differed significantly. We then used the RR-BLUP model, developed from single-cross data, to predict the performance of testcrosses within 14 biparental populations. The r(MG) values within each testcross population were generally low and were often negative. These results were obtained despite the above-average level of linkage disequilibrium, i.e., r(2) between adjacent markers of 0.35 in the BSSS inbreds and 0.26 in the non-BSSS inbreds. Overall, our results suggested that genomewide marker effects estimated from maize single crosses are not advantageous (cofmpared with BLUP) for predicting single-cross performance and have erratic usefulness for predicting testcross performance within a biparental cross.
Revisiting crash spatial heterogeneity: A Bayesian spatially varying coefficients approach.
Xu, Pengpeng; Huang, Helai; Dong, Ni; Wong, S C
2017-01-01
This study was performed to investigate the spatially varying relationships between crash frequency and related risk factors. A Bayesian spatially varying coefficients model was elaborately introduced as a methodological alternative to simultaneously account for the unstructured and spatially structured heterogeneity of the regression coefficients in predicting crash frequencies. The proposed method was appealing in that the parameters were modeled via a conditional autoregressive prior distribution, which involved a single set of random effects and a spatial correlation parameter with extreme values corresponding to pure unstructured or pure spatially correlated random effects. A case study using a three-year crash dataset from the Hillsborough County, Florida, was conducted to illustrate the proposed model. Empirical analysis confirmed the presence of both unstructured and spatially correlated variations in the effects of contributory factors on severe crash occurrences. The findings also suggested that ignoring spatially structured heterogeneity may result in biased parameter estimates and incorrect inferences, while assuming the regression coefficients to be spatially clustered only is probably subject to the issue of over-smoothness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Miles, Jeremy N V; Kulesza, Magdalena; Ewing, Brett; Shih, Regina A; Tucker, Joan S; D'Amico, Elizabeth J
When researchers find an association between two variables, it is useful to evaluate the role of other constructs in this association. While assessing these mediation effects, it is important to determine if results are equal for different groups. It is possible that the strength of a mediation effect may differ for males and females, for example - such an effect is known as moderated mediation. Participants were 2532 adolescents from diverse ethnic/racial backgrounds and equally distributed across gender. The goal of this study was to investigate parental respect as a potential mediator of the relationship between gender and delinquency and mental health, and to determine whether observed mediation is moderated by gender. Parental respect mediated the association between gender and both delinquency and mental health. Specifically, parental respect was a protective factor against delinquency and mental health problems for both females and males. Demonstrated the process of estimating models in Lavaan, using two approaches (i.e. single group regression and multiple group regression model), and including covariates in both models.
Miles, Jeremy N.V.; Kulesza, Magdalena; Ewing, Brett; Shih, Regina A.; Tucker, Joan S.; D’Amico, Elizabeth J.
2015-01-01
Purpose When researchers find an association between two variables, it is useful to evaluate the role of other constructs in this association. While assessing these mediation effects, it is important to determine if results are equal for different groups. It is possible that the strength of a mediation effect may differ for males and females, for example – such an effect is known as moderated mediation. Design Participants were 2532 adolescents from diverse ethnic/racial backgrounds and equally distributed across gender. The goal of this study was to investigate parental respect as a potential mediator of the relationship between gender and delinquency and mental health, and to determine whether observed mediation is moderated by gender. Findings Parental respect mediated the association between gender and both delinquency and mental health. Specifically, parental respect was a protective factor against delinquency and mental health problems for both females and males. Practical implications Demonstrated the process of estimating models in Lavaan, using two approaches (i.e. single group regression and multiple group regression model), and including covariates in both models. PMID:26500722
NASA Astrophysics Data System (ADS)
Lolli, Simone; Campbell, James R.; Lewis, Jasper R.; Gu, Yu; Welton, Ellsworth J.
2017-06-01
We compare, for the first time, the performance of a simplified atmospheric radiative transfer algorithm package, the Corti-Peter (CP) model, versus the more complex Fu-Liou-Gu (FLG) model, for resolving top-of-the-atmosphere radiative forcing characteristics from single-layer cirrus clouds obtained from the NASA Micro-Pulse Lidar Network database in 2010 and 2011 at Singapore and in Greenbelt, Maryland, USA, in 2012. Specifically, CP simplifies calculation of both clear-sky longwave and shortwave radiation through regression analysis applied to radiative calculations, which contributes significantly to differences between the two. The results of the intercomparison show that differences in annual net top-of-the-atmosphere (TOA) cloud radiative forcing can reach 65 %. This is particularly true when land surface temperatures are warmer than 288 K, where the CP regression analysis becomes less accurate. CP proves useful for first-order estimates of TOA cirrus cloud forcing, but may not be suitable for quantitative accuracy, including the absolute sign of cirrus cloud daytime TOA forcing that can readily oscillate around zero globally.
Vanderick, S; Harris, B L; Pryce, J E; Gengler, N
2009-03-01
In New Zealand, a large proportion of cows are currently crossbreds, mostly Holstein-Friesians (HF) x Jersey (JE). The genetic evaluation system for milk yields is considering the same additive genetic effects for all breeds. The objective was to model different additive effects according to parental breeds to obtain first estimates of correlations among breed-specific effects and to study the usefulness of this type of random regression test-day model. Estimates of (co)variance components for purebred HF and JE cattle in purebred herds were computed by using a single-breed model. This analysis showed differences between the 2 breeds, with a greater variability in the HF breed. (Co)variance components for purebred HF and JE and crossbred HF x JE cattle were then estimated by using a complete multibreed model in which computations of complete across-breed (co)variances were simplified by correlating only eigenvectors for HF and JE random regressions of the same order as obtained from the single-breed analysis. Parameter estimates differed more strongly than expected between the single-breed and multibreed analyses, especially for JE. This could be due to differences between animals and management in purebred and non-purebred herds. In addition, the model used only partially accounted for heterosis. The multibreed analysis showed additive genetic differences between the HF and JE breeds, expressed as genetic correlations of additive effects in both breeds, especially in linear and quadratic Legendre polynomials (respectively, 0.807 and 0.604). The differences were small for overall milk production (0.926). Results showed that permanent environmental lactation curves were highly correlated across breeds; however, intraherd lactation curves were also affected by the breed-environment interaction. This result may indicate the existence of breed-specific competition effects that vary through the different lactation stages. In conclusion, a multibreed model similar to the one presented could optimally use the environmental and genetic parameters and provide breed-dependent additive breeding values. This model could also be a useful tool to evaluate crossbred dairy cattle populations like those in New Zealand. However, a routine evaluation would still require the development of an improved methodology. It would also be computationally very challenging because of the simultaneous presence of a large number of breeds.
Wang, Jake; Perry, Curtis J; Meeth, Katrina; Thakral, Durga; Damsky, William; Micevic, Goran; Kaech, Susan; Blenman, Kim; Bosenberg, Marcus
2017-07-01
Human melanomas exhibit relatively high somatic mutation burden compared to other malignancies. These somatic mutations may produce neoantigens that are recognized by the immune system, leading to an antitumor response. By irradiating a parental mouse melanoma cell line carrying three driver mutations with UVB and expanding a single-cell clone, we generated a mutagenized model that exhibits high somatic mutation burden. When inoculated at low cell numbers in immunocompetent C57BL/6J mice, YUMMER1.7 (Yale University Mouse Melanoma Exposed to Radiation) regresses after a brief period of growth. This regression phenotype is dependent on T cells as YUMMER1.7 tumors grow significantly faster in immunodeficient Rag1 -/- mice and C57BL/6J mice depleted of CD4 and CD8 T cells. Interestingly, regression can be overcome by injecting higher cell numbers of YUMMER1.7, which results in tumors that grow without effective rejection. Mice that have previously rejected YUMMER1.7 tumors develop immunity against higher doses of YUMMER1.7 tumor challenge. In addition, escaping YUMMER1.7 tumors are sensitive to anti-CTLA-4 and anti-PD-1 therapy, establishing a new model for the evaluation of immune checkpoint inhibition and antitumor immune responses. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aksu, Z.; Acikel, U.; Kutsal, T.
1999-02-01
Although the biosorption of single metal ions to various kinds of microorganisms has been extensively studied and the adsorption isotherms have been developed for only the single metal ion situation, very little attention has been given to the bioremoval and expression of adsorption isotherms of multimetal ions systems. In this study the simultaneous biosorption of copper(II) and chromium(VI) to Chlorella vulgaris from a binary metal mixture was studied and compared with the single metal ion situation in a batch stirred system. The effects of pH and single- and dual-metal ion concentrations on the equilibrium uptakes were investigated. In previous studiesmore » the optimum biosorption pH had been determined as 4.0 for copper(II) and as 2.0 for chromium(VI). Multimetal ion biosorption studies were performed at these two pH values. It was observed that the equilibrium uptakes of copper(II) or chromium(VI) ions were changed due to the biosorption pH and the presence of other metal ions. Adsorption isotherms were developed for both single- and dual-metal ions systems at these two pH values, and expressed by the mono- and multicomponent Langmuir and Freundlich adsorption models. Model parameters were estimated by nonlinear regression. It was seen that the adsorption equilibrium data fitted very well to the competitive Freundlich model in the concentration ranges studied.« less
Sensitivity of ALOS/PALSAR imagery to forest degradation by fire in northern Amazon
NASA Astrophysics Data System (ADS)
Martins, Flora da Silva Ramos Vieira; dos Santos, João Roberto; Galvão, Lênio Soares; Xaud, Haron Abrahim Magalhães
2016-07-01
We evaluated the sensitivity of the full polarimetric Phased Array type L-band Synthetic Aperture Radar (PALSAR), onboard the Advanced Land Observing Satellite (ALOS), to forest degradation caused by fires in northern Amazon, Brazil. We searched for changes in PALSAR signal and tri-dimensional polarimetric responses for different classes of fire disturbance defined by fire frequency and severity. Since the aboveground biomass (AGB) is affected by fire, multiple regression models to estimate AGB were obtained for the whole set of coherent and incoherent attributes (general model) and for each set separately (specific models). The results showed that the polarimetric L-band PALSAR attributes were sensitive to variations in canopy structure and AGB caused by forest fire. However, except for the unburned and thrice burned classes, no single PALSAR attribute was able to discriminate between the intermediate classes of forest degradation by fire. Both the coherent and incoherent polarimetric attributes were important to explain AGB variations in tropical forests affected by fire. The HV backscattering coefficient, anisotropy, double-bounce component, orientation angle, volume index and HH-VV phase difference were PALSAR attributes selected from multiple regression analysis to estimate AGB. The general regression model, combining phase and power radar metrics, presented better results than specific models using coherent or incoherent attributes. The polarimetric responses indicated the dominance of VV-oriented backscattering in primary forest and lightly burned forests. The HH-oriented backscattering predominated in heavily and frequently burned forests. The results suggested a greater contribution of horizontally arranged constituents such as fallen trunks or branches in areas severely affected by fire.
Miller, Nathan; Prevatt, Frances
2017-10-01
The purpose of this study was to reexamine the latent structure of ADHD and sluggish cognitive tempo (SCT) due to issues with construct validity. Two proposed changes to the construct include viewing hyperactivity and sluggishness (hypoactivity) as a single continuum of activity level, and viewing inattention as a separate dimension from activity level. Data were collected from 1,398 adults using Amazon's MTurk. A new scale measuring activity level was developed, and scores of Inattention were regressed onto scores of Activity Level using curvilinear regression. The Activity Level scale showed acceptable levels of internal consistency, normality, and unimodality. Curvilinear regression indicates that a quadratic (curvilinear) model accurately explains a small but significant portion of the variance in levels of inattention. Hyperactivity and hypoactivity may be viewed as a continuum, rather than separate disorders. Inattention may have a U-shaped relationship with activity level. Linear analyses may be insufficient and inaccurate for studying ADHD.
NASA Astrophysics Data System (ADS)
Fullard, James H.; Ter Hofstede, Hannah M.; Ratcliffe, John M.; Pollack, Gerald S.; Brigidi, Gian S.; Tinghitella, Robin M.; Zuk, Marlene
2010-01-01
The auditory thresholds of the AN2 interneuron and the behavioural thresholds of the anti-bat flight-steering responses that this cell evokes are less sensitive in female Pacific field crickets that live where bats have never existed (Moorea) compared with individuals subjected to intense levels of bat predation (Australia). In contrast, the sensitivity of the auditory interneuron, ON1 which participates in the processing of both social signals and bat calls, and the thresholds for flight orientation to a model of the calling song of male crickets show few differences between the two populations. Genetic analyses confirm that the two populations are significantly distinct, and we conclude that the absence of bats has caused partial regression in the nervous control of a defensive behaviour in this insect. This study represents the first examination of natural evolutionary regression in the neural basis of a behaviour along a selection gradient within a single species.
Computational tools for exact conditional logistic regression.
Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P
Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Patnaik, Surya N.
2000-01-01
A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.
A canonical correlation neural network for multicollinearity and functional data.
Gou, Zhenkun; Fyfe, Colin
2004-03-01
We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression (at one extreme) to Canonical Correlation Analysis (at the other)and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.
Probabilistic forecasting for extreme NO2 pollution episodes.
Aznarte, José L
2017-10-01
In this study, we investigate the convenience of quantile regression to predict extreme concentrations of NO 2 . Contrarily to the usual point-forecasting, where a single value is forecast for each horizon, probabilistic forecasting through quantile regression allows for the prediction of the full probability distribution, which in turn allows to build models specifically fit for the tails of this distribution. Using data from the city of Madrid, including NO 2 concentrations as well as meteorological measures, we build models that predict extreme NO 2 concentrations, outperforming point-forecasting alternatives, and we prove that the predictions are accurate, reliable and sharp. Besides, we study the relative importance of the independent variables involved, and show how the important variables for the median quantile are different than those important for the upper quantiles. Furthermore, we present a method to compute the probability of exceedance of thresholds, which is a simple and comprehensible manner to present probabilistic forecasts maximizing their usefulness. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modeling Array Stations in SIG-VISA
NASA Astrophysics Data System (ADS)
Ding, N.; Moore, D.; Russell, S.
2013-12-01
We add support for array stations to SIG-VISA, a system for nuclear monitoring using probabilistic inference on seismic signals. Array stations comprise a large portion of the IMS network; they can provide increased sensitivity and more accurate directional information compared to single-component stations. Our existing model assumed that signals were independent at each station, which is false when lots of stations are close together, as in an array. The new model removes that assumption by jointly modeling signals across array elements. This is done by extending our existing Gaussian process (GP) regression models, also known as kriging, from a 3-dimensional single-component space of events to a 6-dimensional space of station-event pairs. For each array and each event attribute (including coda decay, coda height, amplitude transfer and travel time), we model the joint distribution across array elements using a Gaussian process that learns the correlation lengthscale across the array, thereby incorporating information of array stations into the probabilistic inference framework. To evaluate the effectiveness of our model, we perform ';probabilistic beamforming' on new events using our GP model, i.e., we compute the event azimuth having highest posterior probability under the model, conditioned on the signals at array elements. We compare the results from our probabilistic inference model to the beamforming currently performed by IMS station processing.
Gupta, Nidhi; Heiden, Marina; Mathiassen, Svend Erik; Holtermann, Andreas
2016-05-01
We aimed at developing and evaluating statistical models predicting objectively measured occupational time spent sedentary or in physical activity from self-reported information available in large epidemiological studies and surveys. Two-hundred-and-fourteen blue-collar workers responded to a questionnaire containing information about personal and work related variables, available in most large epidemiological studies and surveys. Workers also wore accelerometers for 1-4 days measuring time spent sedentary and in physical activity, defined as non-sedentary time. Least-squares linear regression models were developed, predicting objectively measured exposures from selected predictors in the questionnaire. A full prediction model based on age, gender, body mass index, job group, self-reported occupational physical activity (OPA), and self-reported occupational sedentary time (OST) explained 63% (R (2)adjusted) of the variance of both objectively measured time spent sedentary and in physical activity since these two exposures were complementary. Single-predictor models based only on self-reported information about either OPA or OST explained 21% and 38%, respectively, of the variance of the objectively measured exposures. Internal validation using bootstrapping suggested that the full and single-predictor models would show almost the same performance in new datasets as in that used for modelling. Both full and single-predictor models based on self-reported information typically available in most large epidemiological studies and surveys were able to predict objectively measured occupational time spent sedentary or in physical activity, with explained variances ranging from 21-63%.
Reppas-Chrysovitsinos, Efstathios; Sobek, Anna; MacLeod, Matthew
2016-06-15
Polymeric materials flowing through the technosphere are repositories of organic chemicals throughout their life cycle. Equilibrium partition ratios of organic chemicals between these materials and air (KMA) or water (KMW) are required for models of fate and transport, high-throughput exposure assessment and passive sampling. KMA and KMW have been measured for a growing number of chemical/material combinations, but significant data gaps still exist. We assembled a database of 363 KMA and 910 KMW measurements for 446 individual compounds and nearly 40 individual polymers and biopolymers, collected from 29 studies. We used the EPI Suite and ABSOLV software packages to estimate physicochemical properties of the compounds and we employed an empirical correlation based on Trouton's rule to adjust the measured KMA and KMW values to a standard reference temperature of 298 K. Then, we used a thermodynamic triangle with Henry's law constant to calculate a complete set of 1273 KMA and KMW values. Using simple linear regression, we developed a suite of single parameter linear free energy relationship (spLFER) models to estimate KMA from the EPI Suite-estimated octanol-air partition ratio (KOA) and KMW from the EPI Suite-estimated octanol-water (KOW) partition ratio. Similarly, using multiple linear regression, we developed a set of polyparameter linear free energy relationship (ppLFER) models to estimate KMA and KMW from ABSOLV-estimated Abraham solvation parameters. We explored the two LFER approaches to investigate (1) their performance in estimating partition ratios, and (2) uncertainties associated with treating all different polymers as a single "bulk" polymeric material compartment. The models we have developed are suitable for screening assessments of the tendency for organic chemicals to be emitted from materials, and for use in multimedia models of the fate of organic chemicals in the indoor environment. In screening applications we recommend that KMA and KMW be modeled as 0.06 ×KOA and 0.06 ×KOW respectively, with an uncertainty range of a factor of 15.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Development of an empirically based dynamic biomechanical strength model
NASA Technical Reports Server (NTRS)
Pandya, A.; Maida, J.; Aldridge, A.; Hasson, S.; Woolford, B.
1992-01-01
The focus here is on the development of a dynamic strength model for humans. Our model is based on empirical data. The shoulder, elbow, and wrist joints are characterized in terms of maximum isolated torque, position, and velocity in all rotational planes. This information is reduced by a least squares regression technique into a table of single variable second degree polynomial equations determining the torque as a function of position and velocity. The isolated joint torque equations are then used to compute forces resulting from a composite motion, which in this case is a ratchet wrench push and pull operation. What is presented here is a comparison of the computed or predicted results of the model with the actual measured values for the composite motion.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Valid Statistical Analysis for Logistic Regression with Multiple Sources
NASA Astrophysics Data System (ADS)
Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
NASA Astrophysics Data System (ADS)
Ramesh, S. T.; Rameshbabu, N.; Gandhimathi, R.; Nidheesh, P. V.; Srikanth Kumar, M.
2012-09-01
Removal of heavy metals is very important with respect to environmental considerations. This study investigated the sorption of copper (Cu) and zinc (Zn) in single and binary aqueous systems onto laboratory prepared hydroxyapatite (HA) surfaces. Batch experiments were carried out using synthetic HA at 30 °C. Parameters that influence the adsorption such as contact time, adsorbent dosage and pH of solution were investigated. The maximum adsorption was found at contact time of 12 and 9 h, HA dosage of 0.4 and 0.7 g/l and pH of 6 and 8 for Cu and Zn, respectively, in single system. Adsorption kinetics data were analyzed using the pseudofirst-, pseudosecond-order and intraparticle diffusion models. The results indicated that the adsorption kinetic data were best described by pseudosecond-order model. Langmuir and Freundlich isotherm models were applied to analyze adsorption data, and Langmuir isotherm was found to be applicable to this adsorption system, in terms of relatively high regression values. The removal capacity of HA was found to be 125 mg of Cu/g, 30.3 mg of Zn/g in single system and 50 mg of Cu/g, 15.16 mg of Zn/g in binary system. The results indicated that the HA used in this work proved to be effective material for removing Cu and Zn from aqueous solutions.
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-01-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. PMID:26126540
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
2015-12-01
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Yu, Yan; Xie, Zhilan; Wang, Jihan; Chen, Chu; Du, Shuli; Chen, Peng; Li, Bin; Jin, Tianbo; Zhao, Heping
2016-12-01
The proportion of alcohol-induced osteonecrosis of the femoral head (ONFH) in all ONFH patients was 30.7%, with males prevailing among the ONFH patients in mainland China (70.1%). Matrix metalloproteinase 2 (MMP2), a member of the MMP gene family, encodes the enzyme MMP2, which can promote osteoclast migration, attachment, and bone matrix degradation. In this case-control study, we aimed to investigate the association between MMP2 and the alcohol-induced ONFH in Chinese males.In total, 299 patients with alcohol-induced ONFH and 396 healthy controls were recruited for a case-control association study. Five single-nucleotide polymorphisms within the MMP2 locus were genotyped and examined for their correlation with the risk of alcohol-induced ONFH and treatment response using Pearson χ test and unconditional logistic regression analysis. We identified 3 risk alleles for carriers: the allele "T" of rs243849 increased the risk of alcohol-induced ONFH in the allele model, the log-additive model without adjustment, and the log-additive model with adjustment for age. Conversely, the genotypes "CC" in rs7201 and "CC" in rs243832 decreased the risk of alcohol-induced ONFH, as revealed by the recessive model. After the Bonferroni multiple adjustment, no significant association was found. Furthermore, the haplotype analysis showed that the "TT" haplotype of MMP2 was more frequent among patients with alcohol-induced ONFH by unconditional logistic regression analysis adjusted for age.In conclusion, there may be an association between MMP2 and the risk of alcohol-induced ONFH in North-Chinese males. However, studies on larger populations are needed to confirm this hypothesis; these data may provide a theoretical foundation for future studies.
Multicomponent blood lipid analysis by means of near infrared spectroscopy, in geese.
Bazar, George; Eles, Viktoria; Kovacs, Zoltan; Romvari, Robert; Szabo, Andras
2016-08-01
This study provides accurate near infrared (NIR) spectroscopic models on some laboratory determined clinicochemical parameters (i.e. total lipid (5.57±1.95 g/l), triglyceride (2.59±1.36 mmol/l), total cholesterol (3.81±0.68 mmol/l), high density lipoprotein (HDL) cholesterol (2.45±0.58 mmol/l)) of blood serum samples of fattened geese. To increase the performance of multivariate chemometrics, samples significantly deviating from the regression models implying laboratory error were excluded from the final calibration datasets. Reference data of excluded samples having outlier spectra in principal component analysis were not marked as false. Samples deviating from the regression models but having non outlier spectra in PCA were identified as having false reference constituent values. Based on the NIR selection methods, 5% of the reference measurement data were rated as doubtful. The achieved models reached R(2) of 0.864, 0.966, 0.850, 0.793, and RMSE of 0.639 g/l, 0.232 mmol/l, 0.210 mmol/l, 0.241 mmol/l for total lipid, triglyceride, total cholesterol and HDL cholesterol, respectively, during independent validation. Classical analytical techniques focus on single constituents and often require chemicals, time-consuming measurements, and experienced technicians. NIR technique provides a quick, cost effective, non-hazardous alternative method for analysis of several constituents based on one single spectrum of each sample, and it also offers the possibility for looking at the laboratory reference data critically. Evaluation of reference data to identify and exclude falsely analyzed samples can provide warning feedback to the reference laboratory, especially in the case of analyses where laboratory methods are not perfectly suited to the subjected material and there is an increased chance of laboratory error. Copyright © 2016 Elsevier B.V. All rights reserved.
Brewer, Michael J; Armstrong, J Scott; Parker, Roy D
2013-06-01
The ability to monitor verde plant bug, Creontiades signatus Distant (Hemiptera: Miridae), and the progression of cotton, Gossypium hirsutum L., boll responses to feeding and associated cotton boll rot provided opportunity to assess if single in-season measurements had value in evaluating at-harvest damage to bolls and if multiple in-season measurements enhanced their combined use. One in-season verde plant bug density measurement, three in-season plant injury measurements, and two at-harvest damage measurements were taken in 15 cotton fields in South Texas, 2010. Linear regression selected two measurements as potentially useful indicators of at-harvest damage: verde plant bug density (adjusted r2 = 0.68; P = 0.0004) and internal boll injury of the carpel wall (adjusted r2 = 0.72; P = 0.004). Considering use of multiple measurements, a stepwise multiple regression of the four in-season measurements selected a univariate model (verde plant bug density) using a 0.15 selection criterion (adjusted r2 = 0.74; P = 0.0002) and a bivariate model (verde plant bug density-internal boll injury) using a 0.25 selection criterion (adjusted r2 = 0.76; P = 0.0007) as indicators of at-harvest damage. In a validation using cultivar and water regime treatments experiencing low verde plant bug pressure in 2011 and 2012, the bivariate model performed better than models using verde plant bug density or internal boll injury separately. Overall, verde plant bug damaging cotton bolls exemplified the benefits of using multiple in-season measurements in pest monitoring programs, under the challenging situation when at-harvest damage results from a sequence of plant responses initiated by in-season insect feeding.
New Insights into Handling Missing Values in Environmental Epidemiological Studies
Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal
2014-01-01
Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278
A hybrid group method of data handling with discrete wavelet transform for GDP forecasting
NASA Astrophysics Data System (ADS)
Isa, Nadira Mohamed; Shabri, Ani
2013-09-01
This study is proposed the application of hybridization model using Group Method of Data Handling (GMDH) and Discrete Wavelet Transform (DWT) in time series forecasting. The objective of this paper is to examine the flexibility of the hybridization GMDH in time series forecasting by using Gross Domestic Product (GDP). A time series data set is used in this study to demonstrate the effectiveness of the forecasting model. This data are utilized to forecast through an application aimed to handle real life time series. This experiment compares the performances of a hybrid model and a single model of Wavelet-Linear Regression (WR), Artificial Neural Network (ANN), and conventional GMDH. It is shown that the proposed model can provide a promising alternative technique in GDP forecasting.
van der Meer, D; Hoekstra, P J; van Donkelaar, M; Bralten, J; Oosterlaan, J; Heslenfeld, D; Faraone, S V; Franke, B; Buitelaar, J K; Hartman, C A
2017-01-01
Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners’ Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic–pituitary–adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples. PMID:28585928
Kanada, Yoshikiyo; Sakurai, Hiroaki; Sugiura, Yoshito; Arai, Tomoaki; Koyama, Soichiro; Tanabe, Shigeo
2017-11-01
[Purpose] To create a regression formula in order to estimate 1RM for knee extensors, based on the maximal isometric muscle strength measured using a hand-held dynamometer and data regarding the body composition. [Subjects and Methods] Measurement was performed in 21 healthy males in their twenties to thirties. Single regression analysis was performed, with measurement values representing 1RM and the maximal isometric muscle strength as dependent and independent variables, respectively. Furthermore, multiple regression analysis was performed, with data regarding the body composition incorporated as another independent variable, in addition to the maximal isometric muscle strength. [Results] Through single regression analysis with the maximal isometric muscle strength as an independent variable, the following regression formula was created: 1RM (kg)=0.714 + 0.783 × maximal isometric muscle strength (kgf). On multiple regression analysis, only the total muscle mass was extracted. [Conclusion] A highly accurate regression formula to estimate 1RM was created based on both the maximal isometric muscle strength and body composition. Using a hand-held dynamometer and body composition analyzer, it was possible to measure these items in a short time, and obtain clinically useful results.
Matsushima, Kazuhide; Peng, Monica; Velasco, Carlos; Schaefer, Eric; Diaz-Arrastia, Ramon; Frankel, Heidi
2012-04-01
Significant glycemic excursions (so-called glucose variability) affect the outcome of generic critically ill patients but has not been well studied in patients with traumatic brain injury (TBI). The purpose of this study was to evaluate the impact of glucose variability on long-term functional outcome of patients with TBI. A noncomputerized tight glucose control protocol was used in our intensivist model surgical intensive care unit. The relationship between the glucose variability and long-term (a median of 6 months after injury) functional outcome defined by extended Glasgow Outcome Scale (GOSE) was analyzed using ordinal logistic regression models. Glucose variability was defined by SD and percentage of excursion (POE) from the preset range glucose level. A total of 109 patients with TBI under tight glucose control had long-term GOSE evaluated. In univariable analysis, there was a significant association between lower GOSE score and higher mean glucose, higher SD, POE more than 60, POE 80 to 150, and single episode of glucose less than 60 mg/dL but not POE 80 to 110. After adjusting for possible confounding variables in multivariable ordinal logistic regression models, higher SD, POE more than 60, POE 80 to 150, and single episode of glucose less than 60 mg/dL were significantly associated with lower GOSE score. Glucose variability was significantly associated with poorer long-term functional outcome in patients with TBI as measured by the GOSE score. Well-designed protocols to minimize glucose variability may be key in improving long-term functional outcome. Copyright © 2012 Elsevier Inc. All rights reserved.
Substitute CT generation from a single ultra short time echo MRI sequence: preliminary study
NASA Astrophysics Data System (ADS)
Ghose, Soumya; Dowling, Jason A.; Rai, Robba; Liney, Gary P.
2017-04-01
In MR guided radiation therapy planning both MR and CT images for a patient are acquired and co-registered to obtain a tissue specific HU map. Generation of the HU map directly from the MRI would eliminate the CT acquisition and may improve radiation therapy planning. In this preliminary study of substitute CT (sCT) generation, two porcine leg phantoms were scanned using a 3D ultrashort echo time (PETRA) sequence and co-registered to corresponding CT images to build tissue specific regression models. The model was created from one co-registered CT-PETRA pair to generate the sCT for the other PETRA image. An expectation maximization based clustering was performed on the co-registered PETRA image to identify the soft tissues, dense bone and air class membership probabilities. A tissue specific non linear regression model was built from one registered CT-PETRA pair dataset to predict the sCT of the second PETRA image in a two-fold cross validation schema. A complete substitute CT is generated in 3 min. The mean absolute HU error for air was 0.3 HU, bone was 95 HU, fat was 30 HU and for muscle it was 10 HU. The mean surface reconstruction error for the bone was 1.3 mm. The PETRA sequence enabled a low mean absolute surface distance for the bone and a low HU error for other classes. The sCT generated from a single PETRA sequence shows promise for the generation of fast sCT for MRI based radiation therapy planning.
A Nonlinear Model for Gene-Based Gene-Environment Interaction.
Sa, Jian; Liu, Xu; He, Tao; Liu, Guifen; Cui, Yuehua
2016-06-04
A vast amount of literature has confirmed the role of gene-environment (G×E) interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP) and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects) are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR) model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC) model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR) model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.
Parametric regression model for survival data: Weibull regression model as an example
2016-01-01
Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846
Data mining: Potential applications in research on nutrition and health.
Batterham, Marijka; Neale, Elizabeth; Martin, Allison; Tapsell, Linda
2017-02-01
Data mining enables further insights from nutrition-related research, but caution is required. The aim of this analysis was to demonstrate and compare the utility of data mining methods in classifying a categorical outcome derived from a nutrition-related intervention. Baseline data (23 variables, 8 categorical) on participants (n = 295) in an intervention trial were used to classify participants in terms of meeting the criteria of achieving 10 000 steps per day. Results from classification and regression trees (CARTs), random forests, adaptive boosting, logistic regression, support vector machines and neural networks were compared using area under the curve (AUC) and error assessments. The CART produced the best model when considering the AUC (0.703), overall error (18%) and within class error (28%). Logistic regression also performed reasonably well compared to the other models (AUC 0.675, overall error 23%, within class error 36%). All the methods gave different rankings of variables' importance. CART found that body fat, quality of life using the SF-12 Physical Component Summary (PCS) and the cholesterol: HDL ratio were the most important predictors of meeting the 10 000 steps criteria, while logistic regression showed the SF-12PCS, glucose levels and level of education to be the most significant predictors (P ≤ 0.01). Differing outcomes suggest caution is required with a single data mining method, particularly in a dataset with nonlinear relationships and outliers and when exploring relationships that were not the primary outcomes of the research. © 2017 Dietitians Association of Australia.
NASA Astrophysics Data System (ADS)
Yang, Guang; Ye, Xujiong; Slabaugh, Greg; Keegan, Jennifer; Mohiaddin, Raad; Firmin, David
2016-03-01
In this paper, we propose a novel self-learning based single-image super-resolution (SR) method, which is coupled with dual-tree complex wavelet transform (DTCWT) based denoising to better recover high-resolution (HR) medical images. Unlike previous methods, this self-learning based SR approach enables us to reconstruct HR medical images from a single low-resolution (LR) image without extra training on HR image datasets in advance. The relationships between the given image and its scaled down versions are modeled using support vector regression with sparse coding and dictionary learning, without explicitly assuming reoccurrence or self-similarity across image scales. In addition, we perform DTCWT based denoising to initialize the HR images at each scale instead of simple bicubic interpolation. We evaluate our method on a variety of medical images. Both quantitative and qualitative results show that the proposed approach outperforms bicubic interpolation and state-of-the-art single-image SR methods while effectively removing noise.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Liang, Zhaohui; Liu, Jun; Huang, Jimmy X; Zeng, Xing
2018-01-01
The genetic polymorphism of Cytochrome P450 (CYP 450) is considered as one of the main causes for adverse drug reactions (ADRs). In order to explore the latent correlations between ADRs and potentially corresponding single-nucleotide polymorphism (SNPs) in CYP450, three algorithms based on information theory are used as the main method to predict the possible relation. The study uses a retrospective case-control study to explore the potential relation of ADRs to specific genomic locations and single-nucleotide polymorphism (SNP). The genomic data collected from 53 healthy volunteers are applied for the analysis, another group of genomic data collected from 30 healthy volunteers excluded from the study are used as the control group. The SNPs respective on five loci of CYP2D6*2,*10,*14 and CYP1A2*1C, *1F are detected by the Applied Biosystem 3130xl. The raw data is processed by ChromasPro to detect the specific alleles on the above loci from each sample. The secondary data are reorganized and processed by R combined with the reports of ADRs from clinical reports. Three information theory based algorithms are implemented for the screening task: JMI, CMIM, and mRMR. If a SNP is selected by more than two algorithms, we are confident to conclude that it is related to the corresponding ADR. The selection results are compared with the control decision tree + LASSO regression model. In the study group where ADRs occur, 10 SNPs are considered relevant to the occurrence of a specific ADR by the combined information theory model. In comparison, only 5 SNPs are considered relevant to a specific ADR by the decision tree + LASSO regression model. In addition, the new method detects more relevant pairs of SNP and ADR which are affected by both SNP and dosage. This implies that the new information theory based model is effective to discover correlations of ADRs and CYP 450 SNPs and is helpful in predicting the potential vulnerable genotype for some ADRs. The newly proposed information theory based model has superiority performance in detecting the relation between SNP and ADR compared to the decision tree + LASSO regression model. The new model is more sensitive to detect ADRs compared to the old method, while the old method is more reliable. Therefore, the selection criteria for selecting algorithms should depend on the pragmatic needs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
SPSS and SAS programming for the testing of mediation models.
Dudley, William N; Benuzillo, Jose G; Carrico, Mineh S
2004-01-01
Mediation modeling can explain the nature of the relation among three or more variables. In addition, it can be used to show how a variable mediates the relation between levels of intervention and outcome. The Sobel test, developed in 1990, provides a statistical method for determining the influence of a mediator on an intervention or outcome. Although interactive Web-based and stand-alone methods exist for computing the Sobel test, SPSS and SAS programs that automatically run the required regression analyses and computations increase the accessibility of mediation modeling to nursing researchers. To illustrate the utility of the Sobel test and to make this programming available to the Nursing Research audience in both SAS and SPSS. The history, logic, and technical aspects of mediation testing are introduced. The syntax files sobel.sps and sobel.sas, created to automate the computation of the regression analysis and test statistic, are available from the corresponding author. The reported programming allows the user to complete mediation testing with the user's own data in a single-step fashion. A technical manual included with the programming provides instruction on program use and interpretation of the output. Mediation modeling is a useful tool for describing the relation between three or more variables. Programming and manuals for using this model are made available.
Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients
NASA Astrophysics Data System (ADS)
Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.
2017-12-01
The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.
Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki
2017-05-01
The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P < 0.01), which was not significant higher correlation than TUG test time. We showed which TUG test parameters were associated with each motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Prevalence and Severity of Dementia in Nursing Home Residents.
Helvik, Anne-Sofie; Engedal, Knut; Benth, Jūratė Šaltytė; Selbæk, Geir
2015-01-01
The aim of this study was to compare the presence and severity of dementia in two large cross-sectional samples of nursing home residents from 2004/2005 and 2010/2011. Demographic information as well as data on the type of nursing home unit, length of stay before assessment, physical health, regularly used prescribed drugs and Clinical Dementia Rating scale scores were used in the analyses. Logistic and linear regression models for hierarchical data were estimated. The odds of the occurrence and of a greater severity of dementia were higher in 2010/2011 than in 2004/2005. Independent of the time of study, married men had more severe dementia than single men, and single women had more severe dementia than single men. The findings may reflect the increase in the need for more nursing home beds designed for people with dementia between 2004/2005 and 2010/2011. © 2015 S. Karger AG, Basel.
Mushkudiani, Nino A; Hukkelhoven, Chantal W P M; Hernández, Adrián V; Murray, Gordon D; Choi, Sung C; Maas, Andrew I R; Steyerberg, Ewout W
2008-04-01
To describe the modeling techniques used for early prediction of outcome in traumatic brain injury (TBI) and to identify aspects for potential improvements. We reviewed key methodological aspects of studies published between 1970 and 2005 that proposed a prognostic model for the Glasgow Outcome Scale of TBI based on admission data. We included 31 papers. Twenty-four were single-center studies, and 22 reported on fewer than 500 patients. The median of the number of initially considered predictors was eight, and on average five of these were selected for the prognostic model, generally including age, Glasgow Coma Score (or only motor score), and pupillary reactivity. The most common statistical technique was logistic regression with stepwise selection of predictors. Model performance was often quantified by accuracy rate rather than by more appropriate measures such as the area under the receiver-operating characteristic curve. Model validity was addressed in 15 studies, but mostly used a simple split-sample approach, and external validation was performed in only four studies. Although most models agree on the three most important predictors, many were developed on small sample sizes within single centers and hence lack generalizability. Modeling strategies have to be improved, and include external validation.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Using species abundance distribution models and diversity indices for biogeographical analyses
NASA Astrophysics Data System (ADS)
Fattorini, Simone; Rigal, François; Cardoso, Pedro; Borges, Paulo A. V.
2016-01-01
We examine whether Species Abundance Distribution models (SADs) and diversity indices can describe how species colonization status influences species community assembly on oceanic islands. Our hypothesis is that, because of the lack of source-sink dynamics at the archipelago scale, Single Island Endemics (SIEs), i.e. endemic species restricted to only one island, should be represented by few rare species and consequently have abundance patterns that differ from those of more widespread species. To test our hypothesis, we used arthropod data from the Azorean archipelago (North Atlantic). We divided the species into three colonization categories: SIEs, archipelagic endemics (AZEs, present in at least two islands) and native non-endemics (NATs). For each category, we modelled rank-abundance plots using both the geometric series and the Gambin model, a measure of distributional amplitude. We also calculated Shannon entropy and Buzas and Gibson's evenness. We show that the slopes of the regression lines modelling SADs were significantly higher for SIEs, which indicates a relative predominance of a few highly abundant species and a lack of rare species, which also depresses diversity indices. This may be a consequence of two factors: (i) some forest specialist SIEs may be at advantage over other, less adapted species; (ii) the entire populations of SIEs are by definition concentrated on a single island, without possibility for inter-island source-sink dynamics; hence all populations must have a minimum number of individuals to survive natural, often unpredictable, fluctuations. These findings are supported by higher values of the α parameter of the Gambin mode for SIEs. In contrast, AZEs and NATs had lower regression slopes, lower α but higher diversity indices, resulting from their widespread distribution over several islands. We conclude that these differences in the SAD models and diversity indices demonstrate that the study of these metrics is useful for biogeographical purposes.
Sewage sludge disintegration by combined treatment of alkaline+high pressure homogenization.
Zhang, Yuxuan; Zhang, Panyue; Zhang, Guangming; Ma, Weifang; Wu, Hao; Ma, Boqiang
2012-11-01
Alkaline pretreatment combined with high pressure homogenization (HPH) was applied to promote sewage sludge disintegration. For sewage sludge with a total solid content of 1.82%, sludge disintegration degree (DD(COD)) with combined treatment was higher than the sum of DD(COD) with single alkaline and single HPH treatment. NaOH dosage ⩽0.04mol/L, homogenization pressure ⩽60MPa and a single homogenization cycle were the suitable conditions for combined sludge treatment. The combined sludge treatment showed a maximum DD(COD) of 59.26%. By regression analysis, the combined sludge disintegration model was established as 11-DD(COD)=0.713C(0.334)P(0.234)N(0.119), showing that the effect of operating parameters on sludge disintegration followed the order: NaOH dosage>homogenization pressure>number of homogenization cycle. The energy efficiency with combined sludge treatment significantly increased compared with that with single HPH treatment, and the high energy efficiency was achieved at low homogenization pressure with a single homogenization cycle. Copyright © 2012 Elsevier Ltd. All rights reserved.
Shirali, M; Strathe, A B; Mark, T; Nielsen, B; Jensen, J
2017-03-01
A novel Horizontal model is presented for multitrait analysis of longitudinal traits through random regression analysis combined with single recorded traits. Weekly ADFI on test for Danish Duroc, Landrace, and Yorkshire boars were available from the national test station and were collected from 30 to 100 kg BW. Single recorded production traits of ADG from birth to 30 kg BW (ADG30), ADG from 30 to 100 kg BW (ADG100), and lean meat percentage (LMP) were available from breeding herds or the national test station. The Horizontal model combined random regression analysis of feed intake (FI) with single recorded traits of ADG100, LMP, and ADG30. In the Horizontal model, the FI data were horizontally structured with FI on each week as a "trait." The additive genetic and litter effects were modeled to be common across different FI records by reducing the rank of the covariance matrices using second- and first-order Legendre polynomials of age on test, respectively. The fixed effect and random residual variance were estimated for each weekly FI trait. Residual feed intake (RFI) was derived from the conditional distribution of FI given the breeding values of ADG100 and LMP. The heritability of FI varied by week on test in Duroc (0.12 to 0.19), Landrace (0.13 to 0.22), and Yorkshire (0.21 to 0.23). The heritability of RFI was lowest and highest in wk 6 (0.03) and 10 (0.10), respectively, in Duroc and wk 7 (0.04 and 0.02) and 1 (0.09 and 0.20), respectively, in Landrace and Yorkshire. The proportion of FI genetic variance explained by RFI ranged from 20 to 75% in Duroc, from 19 to 75% in Landrace, and from 11 to 91% in Yorkshire. Average daily gain from 30 to 100 kg BW and ADG30 heritabilities were moderate in Duroc (0.24 and 0.22, respectively), Landrace (0.34 and 0.25, respectively), and Yorkshire (0.34 and 0.22, respectively). Lean meat percentage heritability was moderate in Duroc (0.37) and large in Landrace (0.62) and Yorkshire (0.60). The genetic correlation of FI with ADG100 increased by week on test followed by a 32% decrease from wk 7 in Duroc and a 7% decrease in dam line breeds. Defining RFI as genetically independent of production traits leads to consistent and easy interpretable breeding values. The genetic parameters of traits in the feed efficiency complex and their dynamics over the test period showed breed differences that could be related to the fatness and growth potential of the breeds. The Horizontal model can be used to simultaneously analyze repeated and single recorded traits through proper modeling of the environmental variances and covariances.
Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network.
Li, Xiang; Xu, Youjun; Lai, Luhua; Pei, Jianfeng
2018-05-30
Adverse side effects of drug-drug interactions induced by human cytochrome P450 (CYP450) inhibition is an important consideration in drug discovery. It is highly desirable to develop computational models that can predict the inhibitive effect of a compound against a specific CYP450 isoform. In this study, we developed a multitask model for concurrent inhibition prediction of five major CYP450 isoforms, namely, 1A2, 2C9, 2C19, 2D6, and 3A4. The model was built by training a multitask autoencoder deep neural network (DNN) on a large dataset containing more than 13 000 compounds, extracted from the PubChem BioAssay Database. We demonstrate that the multitask model gave better prediction results than that of single-task models, previous reported classifiers, and traditional machine learning methods on an average of five prediction tasks. Our multitask DNN model gave average prediction accuracies of 86.4% for the 10-fold cross-validation and 88.7% for the external test datasets. In addition, we built linear regression models to quantify how the other tasks contributed to the prediction difference of a given task between single-task and multitask models, and we explained under what conditions the multitask model will outperform the single-task model, which suggested how to use multitask DNN models more effectively. We applied sensitivity analysis to extract useful knowledge about CYP450 inhibition, which may shed light on the structural features of these isoforms and give hints about how to avoid side effects during drug development. Our models are freely available at http://repharma.pku.edu.cn/deepcyp/home.php or http://www.pkumdl.cn/deepcyp/home.php .
Single versus multiple sets of resistance exercise: a meta-regression.
Krieger, James W
2009-09-01
There has been considerable debate over the optimal number of sets per exercise to improve musculoskeletal strength during a resistance exercise program. The purpose of this study was to use hierarchical, random-effects meta-regression to compare the effects of single and multiple sets per exercise on dynamic strength. English-language studies comparing single with multiple sets per exercise, while controlling for other variables, were considered eligible for inclusion. The analysis comprised 92 effect sizes (ESs) nested within 30 treatment groups and 14 studies. Multiple sets were associated with a larger ES than a single set (difference = 0.26 +/- 0.05; confidence interval [CI]: 0.15, 0.37; p < 0.0001). In a dose-response model, 2 to 3 sets per exercise were associated with a significantly greater ES than 1 set (difference = 0.25 +/- 0.06; CI: 0.14, 0.37; p = 0.0001). There was no significant difference between 1 set per exercise and 4 to 6 sets per exercise (difference = 0.35 +/- 0.25; CI: -0.05, 0.74; p = 0.17) or between 2 to 3 sets per exercise and 4 to 6 sets per exercise (difference = 0.09 +/- 0.20; CI: -0.31, 0.50; p = 0.64). There were no interactions between set volume and training program duration, subject training status, or whether the upper or lower body was trained. Sensitivity analysis revealed no highly influential studies, and no evidence of publication bias was observed. In conclusion, 2 to 3 sets per exercise are associated with 46% greater strength gains than 1 set, in both trained and untrained subjects.
Ding, H; Chen, C; Zhang, X
2016-01-01
The linear solvation energy relationship (LSER) was applied to predict the adsorption coefficient (K) of synthetic organic compounds (SOCs) on single-walled carbon nanotubes (SWCNTs). A total of 40 log K values were used to develop and validate the LSER model. The adsorption data for 34 SOCs were collected from 13 published articles and the other six were obtained in our experiment. The optimal model composed of four descriptors was developed by a stepwise multiple linear regression (MLR) method. The adjusted r(2) (r(2)adj) and root mean square error (RMSE) were 0.84 and 0.49, respectively, indicating good fitness. The leave-one-out cross-validation Q(2) ([Formula: see text]) was 0.79, suggesting the robustness of the model was satisfactory. The external Q(2) ([Formula: see text]) and RMSE (RMSEext) were 0.72 and 0.50, respectively, showing the model's strong predictive ability. Hydrogen bond donating interaction (bB) and cavity formation and dispersion interactions (vV) stood out as the two most influential factors controlling the adsorption of SOCs onto SWCNTs. The equilibrium concentration would affect the fitness and predictive ability of the model, while the coefficients varied slightly.
Revisiting the Table 2 fallacy: A motivating example examining preeclampsia and preterm birth.
Bandoli, Gretchen; Palmsten, Kristin; Chambers, Christina D; Jelliffe-Pawlowski, Laura L; Baer, Rebecca J; Thompson, Caroline A
2018-05-21
A "Table Fallacy," as coined by Westreich and Greenland, reports multiple adjusted effect estimates from a single model. This practice, which remains common in published literature, can be problematic when different types of effect estimates are presented together in a single table. The purpose of this paper is to quantitatively illustrate this potential for misinterpretation with an example estimating the effects of preeclampsia on preterm birth. We analysed a retrospective population-based cohort of 2 963 888 singleton births in California between 2007 and 2012. We performed a modified Poisson regression to calculate the total effect of preeclampsia on the risk of PTB, adjusting for previous preterm birth. pregnancy alcohol abuse, maternal education, and maternal socio-demographic factors (Model 1). In subsequent models, we report the total effects of previous preterm birth, alcohol abuse, and education on the risk of PTB, comparing and contrasting the controlled direct effects, total effects, and confounded effect estimates, resulting from Model 1. The effect estimate for previous preterm birth (a controlled direct effect in Model 1) increased 10% when estimated as a total effect. The risk ratio for alcohol abuse, biased due to an uncontrolled confounder in Model 1, was reduced by 23% when adjusted for drug abuse. The risk ratio for maternal education, solely a predictor of the outcome, was essentially unchanged. Reporting multiple effect estimates from a single model may lead to misinterpretation and lack of reproducibility. This example highlights the need for careful consideration of the types of effects estimated in statistical models. © 2018 John Wiley & Sons Ltd.
Optimizing complex phenotypes through model-guided multiplex genome engineering
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...
2017-05-25
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Optimizing complex phenotypes through model-guided multiplex genome engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
NASA Astrophysics Data System (ADS)
Rao, M.; Vuong, H.
2013-12-01
The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.
Hoffman, Kate; Webster, Thomas F.; Bartell, Scott M.; Weisskopf, Marc G.; Fletcher, Tony; Vieira, Verónica M.
2011-01-01
Background The C8 Health Project was established in 2005 to collect data on perfluorooctanoic acid (PFOA, or C8) and human health in Ohio and West Virginia communities contaminated by a fluoropolymer production facility. Objective We assessed PFOA exposure via contaminated drinking water in a subset of C8 Health Project participants who drank water from private wells. Methods Participants provided demographic information and residential, occupational, and medical histories. Laboratory analyses were conducted to determine serum-PFOA concentrations. PFOA data were collected from 2001 through 2005 from 62 private drinking water wells. We examined the relationship between drinking water and PFOA levels in serum using robust regression methods. As a comparison with regression models, we used a first-order, single-compartment pharmacokinetic model to estimate the serum:drinking-water concentration ratio at steady state. Results The median serum PFOA concentration in 108 study participants who used private wells was 75.7 μg/L, approximately 20 times greater than the levels in the U.S. general population but similar to those of local residents who drank public water. Each 1 μg/L increase in PFOA levels in drinking water was associated with an increase in serum concentrations of 141.5 μg/L (95% confidence interval, 134.9–148.1). The serum:drinking-water concentration ratio for the steady-state pharmacokinetic model was 114. Conclusions PFOA-contaminated drinking water is a significant contributor to PFOA levels in serum in the study population. Regression methods and pharmacokinetic modeling produced similar estimates of the relationship. PMID:20920951
Kashuba, Roxolana; Cha, YoonKyung; Alameddine, Ibrahim; Lee, Boknam; Cuffney, Thomas F.
2010-01-01
Multilevel hierarchical modeling methodology has been developed for use in ecological data analysis. The effect of urbanization on stream macroinvertebrate communities was measured across a gradient of basins in each of nine metropolitan regions across the conterminous United States. The hierarchical nature of this dataset was harnessed in a multi-tiered model structure, predicting both invertebrate response at the basin scale and differences in invertebrate response at the region scale. Ordination site scores, total taxa richness, Ephemeroptera, Plecoptera, Trichoptera (EPT) taxa richness, and richness-weighted mean tolerance of organisms at a site were used to describe invertebrate responses. Percentage of urban land cover was used as a basin-level predictor variable. Regional mean precipitation, air temperature, and antecedent agriculture were used as region-level predictor variables. Multilevel hierarchical models were fit to both levels of data simultaneously, borrowing statistical strength from the complete dataset to reduce uncertainty in regional coefficient estimates. Additionally, whereas non-hierarchical regressions were only able to show differing relations between invertebrate responses and urban intensity separately for each region, the multilevel hierarchical regressions were able to explain and quantify those differences within a single model. In this way, this modeling approach directly establishes the importance of antecedent agricultural conditions in masking the response of invertebrates to urbanization in metropolitan regions such as Milwaukee-Green Bay, Wisconsin; Denver, Colorado; and Dallas-Fort Worth, Texas. Also, these models show that regions with high precipitation, such as Atlanta, Georgia; Birmingham, Alabama; and Portland, Oregon, start out with better regional background conditions of invertebrates prior to urbanization but experience faster negative rates of change with urbanization. Ultimately, this urbanization-invertebrate response example is used to detail the multilevel hierarchical construction methodology, showing how the result is a set of models that are both statistically more rigorous and ecologically more interpretable than simple linear regression models.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235
Littlejohn, B P; Riley, D G; Welsh, T H; Randel, R D; Willard, S T; Vann, R C
2018-05-12
The objective was to estimate genetic parameters of temperament in beef cattle across an age continuum. The population consisted predominantly of Brahman-British crossbred cattle. Temperament was quantified by: 1) pen score (PS), the reaction of a calf to a single experienced evaluator on a scale of 1 to 5 (1 = calm, 5 = excitable); 2) exit velocity (EV), the rate (m/sec) at which a calf traveled 1.83 m upon exiting a squeeze chute; and 3) temperament score (TS), the numerical average of PS and EV. Covariates included days of age and proportion of Bos indicus in the calf and dam. Random regression models included the fixed effects determined from the repeated measures models, except for calf age. Likelihood ratio tests were used to determine the most appropriate random structures. In repeated measures models, the proportion of Bos indicus in the calf was positively related with each calf temperament trait (0.41 ± 0.20, 0.85 ± 0.21, and 0.57 ± 0.18 for PS, EV, and TS, respectively; P < 0.01). There was an effect of contemporary group (combinations of season, year of birth, and management group) and dam age (P < 0.001) in all models. From repeated records analyses, estimates of heritability (h2) were 0.34 ± 0.04, 0.31 ± 0.04, and 0.39 ± 0.04, while estimates of permanent environmental variance as a proportion of the phenotypic variance (c2) were 0.30 ± 0.04, 0.31 ± 0.03, and 0.34 ± 0.04 for PS, EV, and TS, respectively. Quadratic additive genetic random regressions on Legendre polynomials of age were significant for all traits. Quadratic permanent environmental random regressions were significant for PS and TS, but linear permanent environmental random regressions were significant for EV. Random regression results suggested that these components change across the age dimension of these data. There appeared to be an increasing influence of permanent environmental effects and decreasing influence of additive genetic effects corresponding to increasing calf age for EV, and to a lesser extent for TS. Inherited temperament may be overcome by accumulating environmental stimuli with increases in age, especially after weaning.
Feng, Mei-chen; Xiao, Lu-jie; Zhang, Mei-jun; Yang, Wu-de; Ding, Guang-wei
2014-01-01
In this study, relationships between normalized difference vegetation index (NDVI) and plant (winter wheat) nitrogen content (PNC) and between PNC and grain protein content (GPC) were investigated using multi-temporal moderate-resolution imaging spectroradiometer (MODIS) data at the different stages of winter wheat in Linfen (Shanxi, P. R. China). The anticipating model for GPC of winter wheat was also established by the approach of NDVI at the different stages of winter wheat. The results showed that the spectrum models of PNC passed F test. The NDVI4.14 regression effect of PNC model of irrigated winter wheat was the best, and that in dry land was NDVI4.30. The PNC of irrigated and dry land winter wheat were significantly (P<0.01) and positively correlated to GPC. Both of protein spectral anticipating model of irrigated and dry land winter wheat passed a significance test (P<0.01). Multiple anticipating models (MAM) were established by NDVI from two periods of irrigated and dry land winter wheat and PNC to link GPC anticipating model. The coefficient of determination R(2) (R) of MAM was greater than that of the other two single-factor models. The relative root mean square error (RRMSE) and relative error (RE) of MAM were lower than those of the other two single-factor models. Therefore, test effects of multiple proteins anticipating model were better than those of single-factor models. The application of multiple anticipating models for predication of protein content (PC) of irrigated and dry land winter wheat was more accurate and reliable. The regionalization analysis of GPC was performed using inverse distance weighted function of GIS, which is likely to provide the scientific basis for the reasonable winter wheat planting in Linfen city, China.
Feng, Mei-chen; Xiao, Lu-jie; Zhang, Mei-jun; Yang, Wu-de; Ding, Guang-wei
2014-01-01
In this study, relationships between normalized difference vegetation index (NDVI) and plant (winter wheat) nitrogen content (PNC) and between PNC and grain protein content (GPC) were investigated using multi-temporal moderate-resolution imaging spectroradiometer (MODIS) data at the different stages of winter wheat in Linfen (Shanxi, P. R. China). The anticipating model for GPC of winter wheat was also established by the approach of NDVI at the different stages of winter wheat. The results showed that the spectrum models of PNC passed F test. The NDVI4.14 regression effect of PNC model of irrigated winter wheat was the best, and that in dry land was NDVI4.30. The PNC of irrigated and dry land winter wheat were significantly (P<0.01) and positively correlated to GPC. Both of protein spectral anticipating model of irrigated and dry land winter wheat passed a significance test (P<0.01). Multiple anticipating models (MAM) were established by NDVI from two periods of irrigated and dry land winter wheat and PNC to link GPC anticipating model. The coefficient of determination R2 (R) of MAM was greater than that of the other two single-factor models. The relative root mean square error (RRMSE) and relative error (RE) of MAM were lower than those of the other two single-factor models. Therefore, test effects of multiple proteins anticipating model were better than those of single-factor models. The application of multiple anticipating models for predication of protein content (PC) of irrigated and dry land winter wheat was more accurate and reliable. The regionalization analysis of GPC was performed using inverse distance weighted function of GIS, which is likely to provide the scientific basis for the reasonable winter wheat planting in Linfen city, China. PMID:24404124
Single Marital Status and Infectious Mortality in Women With Cervical Cancer in the United States.
Machida, Hiroko; Eckhardt, Sarah E; Castaneda, Antonio V; Blake, Erin A; Pham, Huyen Q; Roman, Lynda D; Matsuo, Koji
2017-10-01
Unmarried status including single marital status is associated with increased mortality in women bearing malignancy. Infectious disease weights a significant proportion of mortality in patients with malignancy. Here, we examined an association of single marital status and infectious mortality in cervical cancer. This is a retrospective observational study examining 86,555 women with invasive cervical cancer identified in the Surveillance, Epidemiology, and End Results Program between 1973 and 2013. Characteristics of 18,324 single women were compared with 38,713 married women in multivariable binary logistic regression models. Propensity score matching was performed to examine cumulative risk of all-cause and infectious mortality between the 2 groups. Single marital status was significantly associated with young age, black/Hispanic ethnicity, Western US residents, uninsured status, high-grade tumor, squamous histology, and advanced-stage disease on multivariable analysis (all, P < 0.05). In a prematched model, single marital status was significantly associated with increased cumulative risk of all-cause mortality (5-year rate: 32.9% vs 29.7%, P < 0.001) and infectious mortality (0.5% vs 0.3%, P < 0.001) compared with the married status. After propensity score matching, single marital status remained an independent prognostic factor for increased cumulative risk of all-cause mortality (adjusted hazards ratio [HR], 1.15; 95% confidence interval [CI], 1.11-1.20; P < 0.001) and those of infectious mortality on multivariable analysis (adjusted HR, 1.71; 95% CI, 1.27-2.32; P < 0.001). In a sensitivity analysis for stage I disease, single marital status remained significantly increased risk of infectious mortality after propensity score matching (adjusted HR, 2.24; 95% CI, 1.34-3.73; P = 0.002). Single marital status was associated with increased infectious mortality in women with invasive cervical cancer.
An Optimization of Inventory Demand Forecasting in University Healthcare Centre
NASA Astrophysics Data System (ADS)
Bon, A. T.; Ng, T. K.
2017-01-01
Healthcare industry becomes an important field for human beings nowadays as it concerns about one’s health. With that, forecasting demand for health services is an important step in managerial decision making for all healthcare organizations. Hence, a case study was conducted in University Health Centre to collect historical demand data of Panadol 650mg for 68 months from January 2009 until August 2014. The aim of the research is to optimize the overall inventory demand through forecasting techniques. Quantitative forecasting or time series forecasting model was used in the case study to forecast future data as a function of past data. Furthermore, the data pattern needs to be identified first before applying the forecasting techniques. Trend is the data pattern and then ten forecasting techniques are applied using Risk Simulator Software. Lastly, the best forecasting techniques will be find out with the least forecasting error. Among the ten forecasting techniques include single moving average, single exponential smoothing, double moving average, double exponential smoothing, regression, Holt-Winter’s additive, Seasonal additive, Holt-Winter’s multiplicative, seasonal multiplicative and Autoregressive Integrated Moving Average (ARIMA). According to the forecasting accuracy measurement, the best forecasting technique is regression analysis.
Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus
2015-10-01
In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.
Abdul Kadir, Nor Ba'yah; Bifulco, Antonia
2013-12-30
The role of marital breakdown in women's mental health is of key concern in Malaysia and internationally. A cross-sectional questionnaire study of married and separated/divorced and widowed women examined insecure attachment style as an associated risk factor for depression among 1002 mothers in an urban community in Malaysia. A previous report replicated a UK-based vulnerability-provoking agent model of depression involving negative evaluation of self (NES) and negative elements in close relationships (NECRs) interacting with severe life events to model depression. This article reports on the additional contribution of insecure attachment style to the model using the Vulnerable Attachment Style Questionnaire (VASQ). The results showed that VASQ scores were highly correlated with NES, NECR and depression. A multiple regression analysis of depression with backward elimination found that VASQ scores had a significant additional effect. Group comparisons showed different risk patterns for single and married mothers. NES was the strongest risk factor for both groups, with the 'anxious style' subset of the VASQ being the best additional predictor for married mothers and the total VASQ score (general attachment insecurity) for single mothers. The findings indicate that attachment insecurity adds to a psychosocial vulnerability model of depression among mothers cross-culturally and is important in understanding and identifying risk. © 2013 Elsevier Ireland Ltd. All rights reserved.
Zoellner, Jamie M.; Porter, Kathleen J.; Chen, Yvonnes; Hedrick, Valisa E.; You, Wen; Hickman, Maja; Estabrooks, Paul A.
2017-01-01
Objective Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Design Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Main Outcome Measures Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. Results TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13–20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6–38%) and behaviour (average 30%, range 6–55%) were significant. Conclusion Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases. PMID:28165771
Grogan-Kaylor, Andrew; Perron, Brian E.; Kilbourne, Amy M.; Woltmann, Emily; Bauer, Mark S.
2013-01-01
Objective Prior meta-analysis indicates that collaborative chronic care models (CCMs) improve mental and physical health outcomes for individuals with mental disorders. This study aimed to investigate the stability of evidence over time and identify patient and intervention factors associated with CCM effects in order to facilitate implementation and sustainability of CCMs in clinical practice. Method We reviewed 53 CCM trials that analyzed depression, mental quality of life (QOL), or physical QOL outcomes. Cumulative meta-analysis and meta-regression were supplemented by descriptive investigations across and within trials. Results Most trials targeted depression in the primary care setting, and cumulative meta-analysis indicated that effect sizes favoring CCM quickly achieved significance for depression outcomes, and more recently achieved significance for mental and physical QOL. Four of six CCM elements (patient self-management support, clinical information systems, system redesign, and provider decision support) were common among reviewed trials, while two elements (healthcare organization support and linkages to community resources) were rare. No single CCM element was statistically associated with the success of the model. Similarly, meta-regression did not identify specific factors associated with CCM effectiveness. Nonetheless, results within individual trials suggest that increased illness severity predicts CCM outcomes. Conclusions Significant CCM trials have been derived primarily from four original CCM elements. Nonetheless, implementing and sustaining this established model will require healthcare organization support. While CCMs have typically been tested as population-based interventions, evidence supports stepped care application to more severely ill individuals. Future priorities include developing implementation strategies to support adoption and sustainability of the model in clinical settings while maximizing fit of this multi-component framework to local contextual factors. PMID:23938600
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Empirical Modeling of Plant Gas Fluxes in Controlled Environments
NASA Technical Reports Server (NTRS)
Cornett, Jessie David
1994-01-01
As humans extend their reach beyond the earth, bioregenerative life support systems must replace the resupply and physical/chemical systems now used. The Controlled Ecological Life Support System (CELSS) will utilize plants to recycle the carbon dioxide (CO2) and excrement produced by humans and return oxygen (O2), purified water and food. CELSS design requires knowledge of gas flux levels for net photosynthesis (PS(sub n)), dark respiration (R(sub d)) and evapotranspiration (ET). Full season gas flux data regarding these processes for wheat (Triticum aestivum), soybean (Glycine max) and rice (Oryza sativa) from published sources were used to develop empirical models. Univariate models relating crop age (days after planting) and gas flux were fit by simple regression. Models are either high order (5th to 8th) or more complex polynomials whose curves describe crop development characteristics. The models provide good estimates of gas flux maxima, but are of limited utility. To broaden the applicability, data were transformed to dimensionless or correlation formats and, again, fit by regression. Polynomials, similar to those in the initial effort, were selected as the most appropriate models. These models indicate that, within a cultivar, gas flux patterns appear remarkably similar prior to maximum flux, but exhibit considerable variation beyond this point. This suggests that more broadly applicable models of plant gas flux are feasible, but univariate models defining gas flux as a function of crop age are too simplistic. Multivariate models using CO2 and crop age were fit for PS(sub n), and R(sub d) by multiple regression. In each case, the selected model is a subset of a full third order model with all possible interactions. These models are improvements over the univariate models because they incorporate more than the single factor, crop age, as the primary variable governing gas flux. They are still limited, however, by their reliance on the other environmental conditions under which the original data were collected. Three-dimensional plots representing the response surface of each model are included. Suitability of using empirical models to generate engineering design estimates is discussed. Recommendations for the use of more complex multivariate models to increase versatility are included.
Grotegut, Chad A; Ngan, Emily; Garrett, Melanie E; Miranda, Marie Lynn; Ashley-Koch, Allison E; Swamy, Geeta K
2017-09-01
Oxytocin is a potent uterotonic agent that is widely used for induction and augmentation of labor. Oxytocin has a narrow therapeutic index and the optimal dosing for any individual woman varies widely. The objective of this study was to determine whether genetic variation in the oxytocin receptor (OXTR) or in the gene encoding G protein-coupled receptor kinase 6 (GRK6), which regulates desensitization of the oxytocin receptor, could explain variation in oxytocin dosing and labor outcomes among women being induced near term. Pregnant women with a singleton gestation residing in Durham County, NC, were prospectively enrolled as part of the Healthy Pregnancy, Healthy Baby cohort study. Those women undergoing an induction of labor at 36 weeks or greater were genotyped for 18 haplotype-tagging single-nucleotide polymorphisms in OXTR and 7 haplotype-tagging single-nucleotide polymorphisms in GRK6 using TaqMan assays. Linear regression was used to examine the relationship between maternal genotype and maximal oxytocin infusion rate, total oxytocin dose received, and duration of labor. Logistic regression was used to test for the association of maternal genotype with mode of delivery. For each outcome, backward selection techniques were utilized to control for important confounding variables and additive genetic models were used. Race/ethnicity was included in all models because of differences in allele frequencies across populations, and Bonferroni correction for multiple testing was used. DNA was available from 482 women undergoing induction of labor at 36 weeks or greater. Eighteen haplotype-tagging single-nucleotide polymorphisms within OXTR and 7 haplotype-tagging single-nucleotide polymorphisms within GRK6 were examined. Five single-nucleotide polymorphisms in OXTR showed nominal significance with maximal infusion rate of oxytocin, and two single-nucleotide polymorphisms in OXTR were associated with total oxytocin dose received. One single-nucleotide polymorphism in OXTR and two single-nucleotide polymorphisms in GRK6 were associated with duration of labor, one of which met the multiple testing threshold (P = .0014, rs2731664 [GRK6], mean duration of labor, 17.7 hours vs 20.2 hours vs 23.5 hours for AA, AC, and CC genotypes, respectively). Three single-nucleotide polymorphisms, two in OXTR and one in GRK6, showed nominal significance with mode of delivery. Genetic variation in OXTR and GRK6 is associated with the amount of oxytocin required as well as the duration of labor and risk for cesarean delivery among women undergoing induction of labor near term. With further research, pharmacogenomic approaches may potentially be utilized to develop personalized treatment to improve safety and efficacy outcomes among women undergoing induction of labor. Copyright © 2017 Elsevier Inc. All rights reserved.
Hollier, John M; Czyzewski, Danita I; Self, Mariella M; Weidler, Erica M; Smith, E O'Brian; Shulman, Robert J
2017-03-01
This study evaluates whether certain patient or parental characteristics are associated with gastroenterology (GI) referral versus primary pediatrics care for pediatric irritable bowel syndrome (IBS). A retrospective clinical trial sample of patients meeting pediatric Rome III IBS criteria was assembled from a single metropolitan health care system. Baseline socioeconomic status (SES) and clinical symptom measures were gathered. Various instruments measured participant and parental psychosocial traits. Study outcomes were stratified by GI referral versus primary pediatrics care. Two separate analyses of SES measures and GI clinical symptoms and psychosocial measures identified key factors by univariate and multiple logistic regression analyses. For each analysis, identified factors were placed in unadjusted and adjusted multivariate logistic regression models to assess their impact in predicting GI referral. Of the 239 participants, 152 were referred to pediatric GI, and 87 were managed in primary pediatrics care. Of the SES and clinical symptom factors, child self-assessment of abdominal pain duration and lower percentage of people living in poverty were the strongest predictors of GI referral. Among the psychosocial measures, parental assessment of their child's functional disability was the sole predictor of GI referral. In multivariate logistic regression models, all selected factors continued to predict GI referral in each model. Socioeconomic environment, clinical symptoms, and functional disability are associated with GI referral. Future interventions designed to ameliorate the effect of these identified factors could reduce unnecessary specialty consultations and health care overutilization for IBS.
Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.
2000-01-01
The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.
Carver, Brett S; Chapinski, Caren; Wongvipat, John; Hieronymus, Haley; Chen, Yu; Chandarlapaty, Sarat; Arora, Vivek K; Le, Carl; Koutcher, Jason; Scher, Howard; Scardino, Peter T; Rosen, Neal; Sawyers, Charles L
2011-01-01
Summary Prostate cancer is characterized by its dependence on androgen receptor and frequent activation of PI3K signaling. We find that AR transcriptional output is decreased in human and murine tumors with PTEN deletion and that PI3K pathway inhibition activates AR signaling by relieving feedback inhibition of HER kinases. Similarly, AR inhibition activates AKT signaling by reducing levels of the AKT phosphatase PHLPP. Thus, these two oncogenic pathways cross-regulate each other by reciprocal feedback. Inhibition of one activates the other, thereby maintaining tumor cell survival. However, combined pharmacologic inhibition of PI3K and AR signaling caused near complete prostate cancer regressions in a Pten-deficient murine prostate cancer model and in human prostate cancer xenografts, indicating that both pathways coordinately support survival. Significance The two most frequently activated signaling pathways in prostate cancer are driven by AR and PI3K. Inhibitors of the PI3K pathway are in early clinical trials and AR inhibitors confer clinical responses in most patients. However, these inhibitors rarely induce tumor regression in preclinical models. Here we show that these pathways regulate each other by reciprocal negative feedback, such that inhibition of one activates the other. Therefore, tumor cells can adapt and survive when either single pathway is inhibited pharmacologically. Our demonstration of profound tumor regressions with combined pathway inhibition in preclinical prostate tumor models provides rationale for combination therapy in patients. PMID:21575859
De Haas, Y; Janss, L L G; Kadarmideen, H N
2007-10-01
Genetic correlations between body condition score (BCS) and fertility traits in dairy cattle were estimated using bivariate random regression models. BCS was recorded by the Swiss Holstein Association on 22,075 lactating heifers (primiparous cows) from 856 sires. Fertility data during first lactation were extracted for 40,736 cows. The fertility traits were days to first service (DFS), days between first and last insemination (DFLI), calving interval (CI), number of services per conception (NSPC) and conception rate to first insemination (CRFI). A bivariate model was used to estimate genetic correlations between BCS as a longitudinal trait by random regression components, and daughter's fertility at the sire level as a single lactation measurement. Heritability of BCS was 0.17, and heritabilities for fertility traits were low (0.01-0.08). Genetic correlations between BCS and fertility over the lactation varied from: -0.45 to -0.14 for DFS; -0.75 to 0.03 for DFLI; from -0.59 to -0.02 for CI; from -0.47 to 0.33 for NSPC and from 0.08 to 0.82 for CRFI. These results show (genetic) interactions between fat reserves and reproduction along the lactation trajectory of modern dairy cows, which can be useful in genetic selection as well as in management. Maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in mid lactation when the genetic variance for BCS is largest, and the genetic correlations between BCS and fertility is strongest.
Van Hertem, T; Bahr, C; Schlageter Tello, A; Viazzi, S; Steensels, M; Romanini, C E B; Lokhorst, C; Maltz, E; Halachmi, I; Berckmans, D
2016-09-01
The objective of this study was to evaluate if a multi-sensor system (milk, activity, body posture) was a better classifier for lameness than the single-sensor-based detection models. Between September 2013 and August 2014, 3629 cow observations were collected on a commercial dairy farm in Belgium. Human locomotion scoring was used as reference for the model development and evaluation. Cow behaviour and performance was measured with existing sensors that were already present at the farm. A prototype of three-dimensional-based video recording system was used to quantify automatically the back posture of a cow. For the single predictor comparisons, a receiver operating characteristics curve was made. For the multivariate detection models, logistic regression and generalized linear mixed models (GLMM) were developed. The best lameness classification model was obtained by the multi-sensor analysis (area under the receiver operating characteristics curve (AUC)=0.757±0.029), containing a combination of milk and milking variables, activity and gait and posture variables from videos. Second, the multivariate video-based system (AUC=0.732±0.011) performed better than the multivariate milk sensors (AUC=0.604±0.026) and the multivariate behaviour sensors (AUC=0.633±0.018). The video-based system performed better than the combined behaviour and performance-based detection model (AUC=0.669±0.028), indicating that it is worthwhile to consider a video-based lameness detection system, regardless the presence of other existing sensors in the farm. The results suggest that Θ2, the feature variable for the back curvature around the hip joints, with an AUC of 0.719 is the best single predictor variable for lameness detection based on locomotion scoring. In general, this study showed that the video-based back posture monitoring system is outperforming the behaviour and performance sensing techniques for locomotion scoring-based lameness detection. A GLMM with seven specific variables (walking speed, back posture measurement, daytime activity, milk yield, lactation stage, milk peak flow rate and milk peak conductivity) is the best combination of variables for lameness classification. The accuracy on four-level lameness classification was 60.3%. The accuracy improved to 79.8% for binary lameness classification. The binary GLMM obtained a sensitivity of 68.5% and a specificity of 87.6%, which both exceed the sensitivity (52.1%±4.7%) and specificity (83.2%±2.3%) of the multi-sensor logistic regression model. This shows that the repeated measures analysis in the GLMM, taking into account the individual history of the animal, outperforms the classification when thresholds based on herd level (a statistical population) are used.
Bonawitz, Rachael; Brennan, Alana T; Long, Lawrence; Heeren, Timothy; Maskew, Mhairi; Sanne, Ian; Fox, Matthew P
2018-06-01
In April 2010, tenofovir and abacavir replaced stavudine in public sector first-line antiretroviral therapy (ART) for children under 20 years old in South Africa. The association of both abacavir and tenofovir with fewer side effects and toxicities compared to stavudine could translate to increased durability of tenofovir or abacavir-based regimens. We evaluated changes over time in regimen durability for paediatric patients 3-19 years of age at eight public sector clinics in Johannesburg, South Africa. Cohort analysis of treatment-naïve, non-pregnant paediatric patients from 3 to 19 years old initiated on ART between April 2004 and December 2013. First-line ART regimens before April 2010 consisted of stavudine or zidovudine with lamivudine and either efavirenz or nevirapine. Tenofovir and/or abacavir was substituted for stavudine after April 2010 in first-line ART. We evaluated the frequency and type of single-drug substitutions, treatment interruptions and switches to second-line therapy. Fine and Gray competing risk regression models were used to evaluate the association of antiretroviral drug type with single-drug substitutions, treatment interruptions and second-line switches in the first 24 months on treatment. Three hundred and ninety-eight (15.3%) single-drug substitutions, 187 (7.2%) treatment interruptions and 86 (3.3%) switches to second-line therapy occurred among 2602 paediatric patients over 24-months on ART. Overall, the rate of single-drug substitutions started to increase in 2009, peaked in 2011 at 25% and then declined to 10% in 2013, well after the integration of tenofovir into paediatric regimens; no patients over the age of 3 were initiated on abacavir for first-line therapy. Competing risk regression models showed patients on zidovudine or stavudine had upwards of a fivefold increase in single-drug substitution vs. patients initiated on tenofovir in the first 24 months on ART. Older adolescents also had a two- to threefold increase in treatment interruptions and switches to second-line therapy compared to younger patients in the first 24 months on ART. The decline in single-drug substitutions is associated with the introduction of tenofovir. Tenofovir use could improve regimen durability and treatment outcomes in resource-limited settings. © 2018 John Wiley & Sons Ltd.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Sensitivity of Chemical Shift-Encoded Fat Quantification to Calibration of Fat MR Spectrum
Wang, Xiaoke; Hernando, Diego; Reeder, Scott B.
2015-01-01
Purpose To evaluate the impact of different fat spectral models on proton density fat-fraction (PDFF) quantification using chemical shift-encoded (CSE) MRI. Material and Methods Simulations and in vivo imaging were performed. In a simulation study, spectral models of fat were compared pairwise. Comparison of magnitude fitting and mixed fitting was performed over a range of echo times and fat fractions. In vivo acquisitions from 41 patients were reconstructed using 7 published spectral models of fat. T2-corrected STEAM-MRS was used as reference. Results Simulations demonstrate that imperfectly calibrated spectral models of fat result in biases that depend on echo times and fat fraction. Mixed fitting is more robust against this bias than magnitude fitting. Multi-peak spectral models showed much smaller differences among themselves than when compared to the single-peak spectral model. In vivo studies show all multi-peak models agree better (for mixed fitting, slope ranged from 0.967–1.045 using linear regression) with reference standard than the single-peak model (for mixed fitting, slope=0.76). Conclusion It is essential to use a multi-peak fat model for accurate quantification of fat with CSE-MRI. Further, fat quantification techniques using multi-peak fat models are comparable and no specific choice of spectral model is shown to be superior to the rest. PMID:25845713
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models. PMID:22871397
Handling limited datasets with neural networks in medical applications: A small-data approach.
Shaikhina, Torgyn; Khovanova, Natalia A
2017-01-01
Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Ioannidis, J P; McQueen, P G; Goedert, J J; Kaslow, R A
1998-03-01
Complex immunogenetic associations of disease involving a large number of gene products are difficult to evaluate with traditional statistical methods and may require complex modeling. The authors evaluated the performance of feed-forward backpropagation neural networks in predicting rapid progression to acquired immunodeficiency syndrome (AIDS) for patients with human immunodeficiency virus (HIV) infection on the basis of major histocompatibility complex variables. Networks were trained on data from patients from the Multicenter AIDS Cohort Study (n = 139) and then validated on patients from the DC Gay cohort (n = 102). The outcome of interest was rapid disease progression, defined as progression to AIDS in <6 years from seroconversion. Human leukocyte antigen (HLA) variables were selected as network inputs with multivariate regression and a previously described algorithm selecting markers with extreme point estimates for progression risk. Network performance was compared with that of logistic regression. Networks with 15 HLA inputs and a single hidden layer of five nodes achieved a sensitivity of 87.5% and specificity of 95.6% in the training set, vs. 77.0% and 76.9%, respectively, achieved by logistic regression. When validated on the DC Gay cohort, networks averaged a sensitivity of 59.1% and specificity of 74.3%, vs. 53.1% and 61.4%, respectively, for logistic regression. Neural networks offer further support to the notion that HIV disease progression may be dependent on complex interactions between different class I and class II alleles and transporters associated with antigen processing variants. The effect in the current models is of moderate magnitude, and more data as well as other host and pathogen variables may need to be considered to improve the performance of the models. Artificial intelligence methods may complement linear statistical methods for evaluating immunogenetic associations of disease.
A secure distributed logistic regression protocol for the detection of rare adverse drug events.
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-05-01
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.
Model selection bias and Freedman's paradox
Lukacs, P.M.; Burnham, K.P.; Anderson, D.R.
2010-01-01
In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152-155, 1983) demonstrated how common it is to select completely unrelated variables as highly "significant" when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties. ?? The Institute of Statistical Mathematics, Tokyo 2009.
Field-scale investigation of pulverized coal mill power consumption
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ganguli, R.; Bandopadhyay, S.
2008-08-15
Twenty field-scale tests were conducted in a 28 MW pulverized coal power plant in Healy, Alaska, to examine mill power consumption in relation to coal grind size. The intent in this field-scale study was to verify if grind size truly impacted power consumption by a detectable amount. The regression model developed from the data indicates that grind size does impact mill power consumption, with finer grinds consuming significantly more power than coarser grinds. However, other factors such as coal hardness (i.e. the lower the Hardgrove Grindability Index, or the harder the coal, the higher the power consumption) and mill throughputmore » (i.e., the higher the throughput, the higher the power consumption) had to be included before the impact of grind size could be isolated. It was also observed that combining amperage and flow rate into a single parameter, i.e., specific amperage, hurt modeling. Cost analysis based on the regression model indicate a power savings of $19,972 per year if the coal were ground to 50% passing 76 {mu}m rather than the industry standard of 70% passing 76 {mu}m. The study also demonstrated that size reduction constituted a significant portion of the power consumption.« less
Brady, Amie M.G.; Plona, Meg B.
2009-01-01
During the recreational season of 2008 (May through August), a regression model relating turbidity to concentrations of Escherichia coli (E. coli) was used to predict recreational water quality in the Cuyahoga River at the historical community of Jaite, within the present city of Brecksville, Ohio, a site centrally located within Cuyahoga Valley National Park. Samples were collected three days per week at Jaite and at three other sites on the river. Concentrations of E. coli were determined and compared to environmental and water-quality measures and to concentrations predicted with a regression model. Linear relations between E. coli concentrations and turbidity, gage height, and rainfall were statistically significant for Jaite. Relations between E. coli concentrations and turbidity were statistically significant for the three additional sites, and relations between E. coli concentrations and gage height were significant at the two sites where gage-height data were available. The turbidity model correctly predicted concentrations of E. coli above or below Ohio's single-sample standard for primary-contact recreation for 77 percent of samples collected at Jaite.
Nygård, Lotte; Vogelius, Ivan R; Fischer, Barbara M; Kjær, Andreas; Langer, Seppo W; Aznar, Marianne C; Persson, Gitte F; Bentzen, Søren M
2018-04-01
The aim of the study was to build a model of first failure site- and lesion-specific failure probability after definitive chemoradiotherapy for inoperable NSCLC. We retrospectively analyzed 251 patients receiving definitive chemoradiotherapy for NSCLC at a single institution between 2009 and 2015. All patients were scanned by fludeoxyglucose positron emission tomography/computed tomography for radiotherapy planning. Clinical patient data and fludeoxyglucose positron emission tomography standardized uptake values from primary tumor and nodal lesions were analyzed by using multivariate cause-specific Cox regression. In patients experiencing locoregional failure, multivariable logistic regression was applied to assess risk of each lesion being the first site of failure. The two models were used in combination to predict probability of lesion failure accounting for competing events. Adenocarcinoma had a lower hazard ratio (HR) of locoregional failure than squamous cell carcinoma (HR = 0.45, 95% confidence interval [CI]: 0.26-0.76, p = 0.003). Distant failures were more common in the adenocarcinoma group (HR = 2.21, 95% CI: 1.41-3.48, p < 0.001). Multivariable logistic regression of individual lesions at the time of first failure showed that primary tumors were more likely to fail than lymph nodes (OR = 12.8, 95% CI: 5.10-32.17, p < 0.001). Increasing peak standardized uptake value was significantly associated with lesion failure (OR = 1.26 per unit increase, 95% CI: 1.12-1.40, p < 0.001). The electronic model is available at http://bit.ly/LungModelFDG. We developed a failure site-specific competing risk model based on patient- and lesion-level characteristics. Failure patterns differed between adenocarcinoma and squamous cell carcinoma, illustrating the limitation of aggregating them into NSCLC. Failure site-specific models add complementary information to conventional prognostic models. Copyright © 2018 International Association for the Study of Lung Cancer. Published by Elsevier Inc. All rights reserved.
Linden, Ariel
2018-05-11
Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. ITSA is commonly evaluated using methods which may produce biased results if model assumptions are violated. In this paper, treatment effects are alternatively assessed by using forecasting methods to closely fit the preintervention observations and then forecast the post-intervention trend. A treatment effect may be inferred if the actual post-intervention observations diverge from the forecasts by some specified amount. The forecasting approach is demonstrated using the effect of California's Proposition 99 for reducing cigarette sales. Three forecast models are fit to the preintervention series-linear regression (REG), Holt-Winters (HW) non-seasonal smoothing, and autoregressive moving average (ARIMA)-and forecasts are generated into the post-intervention period. The actual observations are then compared with the forecasts to assess intervention effects. The preintervention data were fit best by HW, followed closely by ARIMA. REG fit the data poorly. The actual post-intervention observations were above the forecasts in HW and ARIMA, suggesting no intervention effect, but below the forecasts in the REG (suggesting a treatment effect), thereby raising doubts about any definitive conclusion of a treatment effect. In a single-group ITSA, treatment effects are likely to be biased if the model is misspecified. Therefore, evaluators should consider using forecast models to accurately fit the preintervention data and generate plausible counterfactual forecasts, thereby improving causal inference of treatment effects in single-group ITSA studies. © 2018 John Wiley & Sons, Ltd.
Wang, Tianyu; Nabavi, Sheida
2018-04-24
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity. Copyright © 2018 Elsevier Inc. All rights reserved.
Crosby, Richard A; Mena, Leandro; Smith, Rachel Vickers
2018-06-01
The aim of this study is to determine, among young Black men who have sex with men (YBMSM), the 12-month efficacy of a single-session, clinic-based intervention promoting condom use to enhance sexual pleasure (purpose 1) and the use of condoms from the start-to-finish of anal sex (purpose 2). A pre-test, post-test randomized controlled trial was conducted, using a 12-month period of follow-up observation, in STI clinics. Data from 394 YBMSM completing baseline and 12-month follow-up assessments were analyzed. The experimental condition comprised a one-to-one, interactive program (Focus on the Future) designed for tailored delivery. Regarding study purpose 1, in an age-adjusted linear regression model for 277 HIV-uninfected men, there was a significant effect of the intervention (Beta=0.13, P =0.036) relative to more favorable sexual experiences when using condoms. Regarding study purpose 2, in an adjusted logistic regression model, for HIV-uninfected men, there was a significant effect of the intervention (AOR=0.54, P =0.048) relative to using condoms from start-to-finish of anal sex. Significant effects for HIV-infected men were not observed. A small, but non-significant, effect was observed relative to men's self-report of always using condoms. This single-session program may be a valuable counseling tool for use in conjunction with pre-exposure prophylaxis-related care for HIV-uninfected YBMSM.
Morrell, Glen R; Ikizler, Talat A; Chen, Xiaorui; Heilbrun, Marta E; Wei, Guo; Boucher, Robert; Beddhu, Srinivasan
2016-07-01
We investigate whether psoas or paraspinous muscle area measured on a single L4-L5 image is a useful measure of whole lean body mass (LBM) compared to dedicated midthigh magnetic resonance imaging (MRI). Observational study. Outpatient dialysis units and a research clinic. One hundred five adult participants on maintenance hemodialysis. No control group was used. Psoas muscle area, paraspinous muscle area, and midthigh muscle area (MTMA) were measured by magnetic resonance imaging. LBM was measured by dual-energy absorptiometry scan. In separate multivariable linear regression models, psoas, paraspinous, and MTMA were associated with increase in LBM. In separate multivariate logistic regression models, C statistics for diagnosis of sarcopenia (defined as <25th percentile of LBM) were 0.69 for paraspinous muscle area, 0.81 for psoas muscle area, and 0.89 for MTMA. With sarcopenia defined as <10th percentile of LBM, the corresponding C statistics were 0.71, 0.92, and 0.94. We conclude that psoas muscle area provides a good measure of whole-body muscle mass, better than paraspinous muscle area but slightly inferior to midthigh measurement. Hence, in body composition studies a single axial MR image at the L4-L5 level can be used to provide information on both fat and muscle and may eliminate the need for time-consuming measurement of muscle area in the thigh. Copyright © 2016 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Confirmatory factor analysis of the female sexual function index.
Opperman, Emily A; Benson, Lindsay E; Milhausen, Robin R
2013-01-01
The Female Sexual Functioning Index (Rosen et al., 2000 ) was designed to assess the key dimensions of female sexual functioning using six domains: desire, arousal, lubrication, orgasm, satisfaction, and pain. A full-scale score was proposed to represent women's overall sexual function. The fifth revision to the Diagnostic and Statistical Manual (DSM) is currently underway and includes a proposal to combine desire and arousal problems. The objective of this article was to evaluate and compare four models of the Female Sexual Functioning Index: (a) single-factor model, (b) six-factor model, (c) second-order factor model, and (4) five-factor model combining the desire and arousal subscales. Cross-sectional and observational data from 85 women were used to conduct a confirmatory factor analysis on the Female Sexual Functioning Index. Local and global goodness-of-fit measures, the chi-square test of differences, squared multiple correlations, and regression weights were used. The single-factor model fit was not acceptable. The original six-factor model was confirmed, and good model fit was found for the second-order and five-factor models. Delta chi-square tests of differences supported best fit for the six-factor model validating usage of the six domains. However, when revisions are made to the DSM-5, the Female Sexual Functioning Index can adapt to reflect these changes and remain a valid assessment tool for women's sexual functioning, as the five-factor structure was also supported.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
A New Monte Carlo Method for Estimating Marginal Likelihoods.
Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O
2018-06-01
Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density ratio estimators. In addition, we provide guidelines on choosing optimal weights. Simulation studies were conducted to examine the empirical performance of the proposed estimator. We further demonstrate the desirable features of the proposed estimator with two real data sets: one is from a prostate cancer study using an ordinal probit regression model with latent variables; the other is for the power prior construction from two Eastern Cooperative Oncology Group phase III clinical trials using the cure rate survival model with similar objectives.
Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
Cheng, Hao; Garrick, Dorian J; Fernando, Rohan L
2017-01-01
A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Integrative Analysis of High-throughput Cancer Studies with Contrasted Penalization
Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Shia, BenChang; Ma, Shuangge
2015-01-01
In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms “classic” meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance. PMID:24395534
Lin, Meihua; Li, Haoli; Zhao, Xiaolei; Qin, Jiheng
2013-01-01
Genome-wide analysis of gene-gene interactions has been recognized as a powerful avenue to identify the missing genetic components that can not be detected by using current single-point association analysis. Recently, several model-free methods (e.g. the commonly used information based metrics and several logistic regression-based metrics) were developed for detecting non-linear dependence between genetic loci, but they are potentially at the risk of inflated false positive error, in particular when the main effects at one or both loci are salient. In this study, we proposed two conditional entropy-based metrics to challenge this limitation. Extensive simulations demonstrated that the two proposed metrics, provided the disease is rare, could maintain consistently correct false positive rate. In the scenarios for a common disease, our proposed metrics achieved better or comparable control of false positive error, compared to four previously proposed model-free metrics. In terms of power, our methods outperformed several competing metrics in a range of common disease models. Furthermore, in real data analyses, both metrics succeeded in detecting interactions and were competitive with the originally reported results or the logistic regression approaches. In conclusion, the proposed conditional entropy-based metrics are promising as alternatives to current model-based approaches for detecting genuine epistatic effects. PMID:24339984
Ramus, Sara Mankoc; Cilensek, Ines; Petrovic, Mojca Globocnik; Soucek, Miroslav; Kruzliak, Peter; Petrovic, Daniel
2016-03-01
Oxidative stress plays an important role in the pathogenesis of diabetes and its complications. The aim of this study was to examine the possible association between seven single nucleotide polymorphisms (SNPs) of the Trx2/TXNIP and TrxR2 genes encoding proteins involved in the thioredoxin antioxidant defence system and the risk of diabetic retinopthy (DR). Cross-sectional case-control study. A total of 802 Slovenian patients with Type 2 diabetes mellitus; 277 patients with DR and 525 with no DR were enrolled. Patients genotypes of the SNPs; including rs8140110, rs7211, rs7212, rs4755, rs1548357, rs4485648 and rs5748469 were determined by the competitive allele specific PCR method. Each genotype of examined SNPs was regressed in a logistic model, assuming the co-dominant, dominant and the recessive models of inheritance with covariates of duration of diabetes, HbA1c, insulin therapy, total cholesterol and LDL cholesterol levels. In the present study, for the first time we identified an association between the rs4485648 polymorphism of the TrxR2 gene and DR in Caucasians with Type 2 DM. The estimated ORs of adjusted logistic regression models were found to be as follows: 4.4 for CT heterozygotes, 4.3 for TT homozygotes (co-dominant genetic model) and 4.4 for CT+TT genotypes (dominant genetic model). In our case-control study we were not able to demonstrate any association between rs8140110, rs7211, rs7212, rs4755, rs1548357, and rs5748469 and DR, however, our findings provide evidence that the rs4485648 polymorphism of the TrxR2 gene might exert an independent effect on the development of DR. Copyright © 2016 Elsevier Inc. All rights reserved.
Henriksson, Tommy; Vescovi, Jason D; Fjellman-Wiklund, Anncristine; Gilenstam, Kajsa
2016-01-01
The purpose of this study was to examine whether field-based and/or laboratory-based assessments are valid tools for predicting key performance characteristics of skating in competitive-level female hockey players. Cross-sectional study. Twenty-three female ice hockey players aged 15-25 years (body mass: 66.1±6.3 kg; height: 169.5±5.5 cm), with 10.6±3.2 years playing experience volunteered to participate in the study. The field-based assessments included 20 m sprint, squat jump, countermovement jump, 30-second repeated jump test, standing long jump, single-leg standing long jump, 20 m shuttle run test, isometric leg pull, one-repetition maximum bench press, and one-repetition maximum squats. The laboratory-based assessments included body composition (dual energy X-ray absorptiometry), maximal aerobic power, and isokinetic strength (Biodex). The on-ice tests included agility cornering s-turn, cone agility skate, transition agility skate, and modified repeat skate sprint. Data were analyzed using stepwise multivariate linear regression analysis. Linear regression analysis was used to establish the relationship between key performance characteristics of skating and the predictor variables. Regression models (adj R (2)) for the on-ice variables ranged from 0.244 to 0.663 for the field-based assessments and from 0.136 to 0.420 for the laboratory-based assessments. Single-leg tests were the strongest predictors for key performance characteristics of skating. Single leg standing long jump alone explained 57.1%, 38.1%, and 29.1% of the variance in skating time during transition agility skate, agility cornering s-turn, and modified repeat skate sprint, respectively. Isokinetic peak torque in the quadriceps at 90° explained 42.0% and 32.2% of the variance in skating time during agility cornering s-turn and modified repeat skate sprint, respectively. Field-based assessments, particularly single-leg tests, are an adequate substitute to more expensive and time-consuming laboratory assessments if the purpose is to gain knowledge about key performance characteristics of skating.
Henriksson, Tommy; Vescovi, Jason D; Fjellman-Wiklund, Anncristine; Gilenstam, Kajsa
2016-01-01
Objectives The purpose of this study was to examine whether field-based and/or laboratory-based assessments are valid tools for predicting key performance characteristics of skating in competitive-level female hockey players. Design Cross-sectional study. Methods Twenty-three female ice hockey players aged 15–25 years (body mass: 66.1±6.3 kg; height: 169.5±5.5 cm), with 10.6±3.2 years playing experience volunteered to participate in the study. The field-based assessments included 20 m sprint, squat jump, countermovement jump, 30-second repeated jump test, standing long jump, single-leg standing long jump, 20 m shuttle run test, isometric leg pull, one-repetition maximum bench press, and one-repetition maximum squats. The laboratory-based assessments included body composition (dual energy X-ray absorptiometry), maximal aerobic power, and isokinetic strength (Biodex). The on-ice tests included agility cornering s-turn, cone agility skate, transition agility skate, and modified repeat skate sprint. Data were analyzed using stepwise multivariate linear regression analysis. Linear regression analysis was used to establish the relationship between key performance characteristics of skating and the predictor variables. Results Regression models (adj R2) for the on-ice variables ranged from 0.244 to 0.663 for the field-based assessments and from 0.136 to 0.420 for the laboratory-based assessments. Single-leg tests were the strongest predictors for key performance characteristics of skating. Single leg standing long jump alone explained 57.1%, 38.1%, and 29.1% of the variance in skating time during transition agility skate, agility cornering s-turn, and modified repeat skate sprint, respectively. Isokinetic peak torque in the quadriceps at 90° explained 42.0% and 32.2% of the variance in skating time during agility cornering s-turn and modified repeat skate sprint, respectively. Conclusion Field-based assessments, particularly single-leg tests, are an adequate substitute to more expensive and time-consuming laboratory assessments if the purpose is to gain knowledge about key performance characteristics of skating. PMID:27574474
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Income and Child Maltreatment in Unmarried Families: Evidence from the Earned Income Tax Credit.
Berger, Lawrence M; Font, Sarah A; Slack, Kristen S; Waldfogel, Jane
2017-12-01
This study estimates the associations of income with both (self-reported) child protective services (CPS) involvement and parenting behaviors that proxy for child abuse and neglect risk among unmarried families. Our primary strategy follows the instrumental variables (IV) approach employed by Dahl and Lochner (2012), which leverages variation between states and over time in the generosity of the total state and federal Earned Income Tax Credit for which a family is eligible to identify exogenous variation in family income. As a robustness check, we also estimate standard OLS regressions (linear probability models), reduced form OLS regressions, and OLS regressions with the inclusion of a control function (each with and without family-specific fixed effects). Our micro-level data are drawn from the Fragile Families and Child Wellbeing Study, a longitudinal birth-cohort of relatively disadvantaged urban children who have been followed from birth to age nine. Results suggest that an exogenous increase in income is associated with reductions in behaviorally-approximated child neglect and CPS involvement, particularly among low-income single-mother families.
Hacisalihoglu, Gokhan; Larbi, Bismark; Settles, A Mark
2010-01-27
The objective of this study was to explore the potential of near-infrared reflectance (NIR) spectroscopy to determine individual seed composition in common bean ( Phaseolus vulgaris L.). NIR spectra and analytical measurements of seed weight, protein, and starch were collected from 267 individual bean seeds representing 91 diverse genotypes. Partial least-squares (PLS) regression models were developed with 61 bean accessions randomly assigned to a calibration data set and 30 accessions assigned to an external validation set. Protein gave the most accurate PLS regression, with the external validation set having a standard error of prediction (SEP) = 1.6%. PLS regressions for seed weight and starch had sufficient accuracy for seed sorting applications, with SEP = 41.2 mg and 4.9%, respectively. Seed color had a clear effect on the NIR spectra, with black beans having a distinct spectral type. Seed coat color did not impact the accuracy of PLS predictions. This research demonstrates that NIR is a promising technique for simultaneous sorting of multiple seed traits in single bean seeds with no sample preparation.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Santori, G; Fontana, I; Bertocchi, M; Gasloli, G; Magoni Rossi, A; Tagliamacco, A; Barocci, S; Nocera, A; Valente, U
2010-05-01
A useful approach to reduce the number of discarded marginal kidneys and to increase the nephron mass is double kidney transplantation (DKT). In this study, we retrospectively evaluated the potential predictors for patient and graft survival in a single-center series of 59 DKT procedures performed between April 21, 1999, and September 21, 2008. The kidney recipients of mean age 63.27 +/- 5.17 years included 16 women (27%) and 43 men (73%). The donors of mean age 69.54 +/- 7.48 years included 32 women (54%) and 27 men (46%). The mean posttransplant dialysis time was 2.37 +/- 3.61 days. The mean hospitalization was 20.12 +/- 13.65 days. Average serum creatinine (SCr) at discharge was 1.5 +/- 0.59 mg/dL. In view of the limited numbers of recipient deaths (n = 4) and graft losses (n = 8) that occurred in our series, the proportional hazards assumption for each Cox regression model with P < .05 was tested by using correlation coefficients between transformed survival times and scaled Schoenfeld residuals, and checked with smoothed plots of Schoenfeld residuals. For patient survival, the variables that reached statistical significance were donor SCr (P = .007), donor creatinine cleararance (P = .023), and recipient age (P = .047). Each significant model passed the Schoenfeld test. By entering these variables into a multivariate Cox model for patient survival, no further significance was observed. In the univariate Cox models performed for graft survival, statistical significance was noted for donor SCr (P = .027), SCr 3 months post-DKT (P = .043), and SCr 6 months post-DKT (P = .017). All significant univariate models for graft survival passed the Schoenfeld test. A final multivariate model retained SCr at 6 months (beta = 1.746, P = .042) and donor SCr (beta = .767, P = .090). In our analysis, SCr at 6 months seemed to emerge from both univariate and multivariate Cox models as a potential predictor of graft survival among DKT. Multicenter studies with larger recipient populations and more graft losses should be performed to confirm our findings. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Forcey, G.M.; Linz, G.M.; Thogmartin, W.E.; Bleier, W.J.
2008-01-01
Blackbirds share wetland habitat with many waterfowl species in Bird Conservation Region 11 (BCR 11), the prairie potholes. Because of similar habitat preferences, there may be associations between blackbird populations and populations of one or more species of waterfowl in BCR11. This study models populations of red-winged blackbirds and yellow-headed blackbirds as a function of multiple waterfowl species using data from the North American Breeding Bird Survey within BCR11. For each blackbird species, we created a global model with blackbird abundance modeled as a function of 11 waterfowl species; nuisance effects (year, route, and observer) also were included in the model. Hierarchical Poisson regression models were fit using Markov chain Monte Carlo methods in WinBUGS 1.4.1. Waterfowl abundances were weakly associated with blackbird numbers, and no single waterfowl species showed a strong correlation with any blackbird species. These findings suggest waterfowl abundance from a single species is not likely a good bioindicator of blackbird abundance; however, a global model provided good fit for predicting red-winged blackbird abundance. Increased model complexity may be required for accurate predictions of blackbird abundance; the amount of data required to construct appropriate models may limit this approach for predicting blackbird abundance in the prairie potholes. Copyright ?? Taylor & Francis Group, LLC.
A matching framework to improve causal inference in interrupted time-series analysis.
Linden, Ariel
2018-04-01
Interrupted time-series analysis (ITSA) is a popular evaluation methodology in which a single treatment unit's outcome is studied over time and the intervention is expected to "interrupt" the level and/or trend of the outcome, subsequent to its introduction. When ITSA is implemented without a comparison group, the internal validity may be quite poor. Therefore, adding a comparable control group to serve as the counterfactual is always preferred. This paper introduces a novel matching framework, ITSAMATCH, to create a comparable control group by matching directly on covariates and then use these matches in the outcomes model. We evaluate the effect of California's Proposition 99 (passed in 1988) for reducing cigarette sales, by comparing California to other states not exposed to smoking reduction initiatives. We compare ITSAMATCH results to 2 commonly used matching approaches, synthetic controls (SYNTH), and regression adjustment; SYNTH reweights nontreated units to make them comparable to the treated unit, and regression adjusts covariates directly. Methods are compared by assessing covariate balance and treatment effects. Both ITSAMATCH and SYNTH achieved covariate balance and estimated similar treatment effects. The regression model found no treatment effect and produced inconsistent covariate adjustment. While the matching framework achieved results comparable to SYNTH, it has the advantage of being technically less complicated, while producing statistical estimates that are straightforward to interpret. Conversely, regression adjustment may "adjust away" a treatment effect. Given its advantages, ITSAMATCH should be considered as a primary approach for evaluating treatment effects in multiple-group time-series analysis. © 2017 John Wiley & Sons, Ltd.
Cefepime vs Other Antibacterial Agents for the Treatment of Enterobacter Species Bacteremia
Siedner, Mark J.; Galar, Alicia; Guzmán-Suarez, Belisa B.; Kubiak, David W.; Baghdady, Nour; Ferraro, Mary Jane; Hooper, David C.; O'Brien, Thomas F.; Marty, Francisco M.
2014-01-01
Background. Carbapenems are recommended for treatment of Enterobacter infections with AmpC phenotypes. Although isolates are typically susceptible to cefepime in vitro, there are few data supporting its clinical efficacy. Methods. We reviewed all cases of Enterobacter species bacteremia at 2 academic hospitals from 2005 to 2011. Outcomes of interest were (1) persistent bacteremia ≥1 calendar day and (2) in-hospital mortality. We fit logistic regression models, adjusting for clinical risk factors and Pitt bacteremia score and performed propensity score analyses to compare the efficacy of cefepime and carbapenems. Results. Three hundred sixty-eight patients experienced Enterobacter species bacteremia and received at least 1 antimicrobial agent, of whom 52 (14%) died during hospitalization. Median age was 59 years; 19% were neutropenic, and 22% were in an intensive care unit on the day of bacteremia. Twenty-nine (11%) patients had persistent bacteremia for ≥1 day after antibacterial initiation. None of the 36 patients who received single-agent cefepime (0%) had persistent bacteremia, as opposed to 4 of 16 (25%) of those who received single-agent carbapenem (P < .01). In multivariable models, there was no association between carbapenem use and persistent bacteremia (adjusted odds ratio [aOR], 1.52; 95% CI, .58–3.98; P = .39), and a nonsignificant lower odds ratio with cefepime use (aOR, 0.52; 95% CI, .19–1.40; P = .19). In-hospital mortality was similar for use of cefepime and carbapenems in adjusted regression models and propensity-score matched analyses. Conclusions. Cefepime has a similar efficacy as carbapenems for the treatment of Enterobacter species bacteremia. Its use should be further explored as a carbapenem-sparing agent in this clinical scenario. PMID:24647022
Cefepime vs other antibacterial agents for the treatment of Enterobacter species bacteremia.
Siedner, Mark J; Galar, Alicia; Guzmán-Suarez, Belisa B; Kubiak, David W; Baghdady, Nour; Ferraro, Mary Jane; Hooper, David C; O'Brien, Thomas F; Marty, Francisco M
2014-06-01
Carbapenems are recommended for treatment of Enterobacter infections with AmpC phenotypes. Although isolates are typically susceptible to cefepime in vitro, there are few data supporting its clinical efficacy. We reviewed all cases of Enterobacter species bacteremia at 2 academic hospitals from 2005 to 2011. Outcomes of interest were (1) persistent bacteremia ≥1 calendar day and (2) in-hospital mortality. We fit logistic regression models, adjusting for clinical risk factors and Pitt bacteremia score and performed propensity score analyses to compare the efficacy of cefepime and carbapenems. Three hundred sixty-eight patients experienced Enterobacter species bacteremia and received at least 1 antimicrobial agent, of whom 52 (14%) died during hospitalization. Median age was 59 years; 19% were neutropenic, and 22% were in an intensive care unit on the day of bacteremia. Twenty-nine (11%) patients had persistent bacteremia for ≥1 day after antibacterial initiation. None of the 36 patients who received single-agent cefepime (0%) had persistent bacteremia, as opposed to 4 of 16 (25%) of those who received single-agent carbapenem (P < .01). In multivariable models, there was no association between carbapenem use and persistent bacteremia (adjusted odds ratio [aOR], 1.52; 95% CI, .58-3.98; P = .39), and a nonsignificant lower odds ratio with cefepime use (aOR, 0.52; 95% CI, .19-1.40; P = .19). In-hospital mortality was similar for use of cefepime and carbapenems in adjusted regression models and propensity-score matched analyses. Cefepime has a similar efficacy as carbapenems for the treatment of Enterobacter species bacteremia. Its use should be further explored as a carbapenem-sparing agent in this clinical scenario.
A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity.
Forno, Erick; Wang, Ting; Yan, Qi; Brehm, John; Acosta-Perez, Edna; Colon-Semidey, Angel; Alvarez, Maria; Boutaoui, Nadia; Cloutier, Michelle M; Alcorn, John F; Canino, Glorisa; Chen, Wei; Celedón, Juan C
2017-10-01
Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.
Kim, Seon Mi; Yoo, Taekyung; Lee, So Young; Kim, Eun Jeong; Lee, Soo Min; Lee, Min Hee; Han, Min Young; Jung, Seung-Hyun; Choi, Jung-Hye; Ryu, Keun Ho; Kim, Hun-Taek
2015-10-15
Suppression of the hypothalamic-pituitary-gonadal axis has been widely utilized for the management of gonadal-hormone-dependent diseases such as endometriosis. Efforts to develop orally available gonadotropin-releasing hormone (GnRH) antagonists for the treatment of gonadal-hormone-dependent diseases led to the discovery of SKI2670, a novel non-peptide GnRH antagonist. The present study was undertaken to pharmacologically characterize SKI2670 in vitro and in vivo. We measured binding affinity and antagonistic activity of SKI2670 for the GnRH receptors. Immediate suppression of gonadotropins by single dosing of SKI2670 was examined in castrated monkeys. Subsequently, influence on gonadal hormones by prolonged administration of SKI2670 was assessed in naive female monkeys. To investigate in vivo efficacy of SKI2670, regression of ectopic implants by repeated administration of SKI2670 was examined in a rat endometriosis model. SKI2670 is a potent functional antagonist for the human GnRH receptor, with subnanomolar binding affinity. In castrated monkeys, single administration of SKI2670 lowered serum luteinizing hormone (LH) levels stronger with longer duration when compared to elagolix at equivalent doses. Moreover, repeated dosing of SKI2670 suppressed serum levels of gonadotropins and gonadal hormones in intact female monkeys while elagolix suppressed serum LH levels only. Finally, it exhibited regressive effects on ectopic implants in a rat endometriosis model without bone loss. Our findings demonstrate robust GnRH antagonistic efficacy of SKI2670 in animal models, suggesting that SKI2670-induced suppression of the hypothalamic-pituitary-gonadal axis may be beneficial for the treatment of gonadal-hormone-dependent diseases such as endometriosis in humans. Copyright © 2015 Elsevier Inc. All rights reserved.
Compositional Effects on Nickel-Base Superalloy Single Crystal Microstructures
NASA Technical Reports Server (NTRS)
MacKay, Rebecca A.; Gabb, Timothy P.; Garg,Anita; Rogers, Richard B.; Nathal, Michael V.
2012-01-01
Fourteen nickel-base superalloy single crystals containing 0 to 5 wt% chromium (Cr), 0 to 11 wt% cobalt (Co), 6 to 12 wt% molybdenum (Mo), 0 to 4 wt% rhenium (Re), and fixed amounts of aluminum (Al) and tantalum (Ta) were examined to determine the effect of bulk composition on basic microstructural parameters, including gamma' solvus, gamma' volume fraction, volume fraction of topologically close-packed (TCP) phases, phase chemistries, and gamma - gamma'. lattice mismatch. Regression models were developed to describe the influence of bulk alloy composition on the microstructural parameters and were compared to predictions by a commercially available software tool that used computational thermodynamics. Co produced the largest change in gamma' solvus over the wide compositional range used in this study, and Mo produced the largest effect on the gamma lattice parameter and the gamma - gamma' lattice mismatch over its compositional range, although Re had a very potent influence on all microstructural parameters investigated. Changing the Cr, Co, Mo, and Re contents in the bulk alloy had a significant impact on their concentrations in the gamma matrix and, to a smaller extent, in the gamma' phase. The gamma phase chemistries exhibited strong temperature dependencies that were influenced by the gamma and gamma' volume fractions. A computational thermodynamic modeling tool significantly underpredicted gamma' solvus temperatures and grossly overpredicted the amount of TCP phase at 982 C. Furthermore, the predictions by the software tool for the gamma - gamma' lattice mismatch were typically of the wrong sign and magnitude, but predictions could be improved if TCP formation was suspended within the software program. However, the statistical regression models provided excellent estimations of the microstructural parameters based on bulk alloy composition, thereby demonstrating their usefulness.
Chen, Baisheng; Wu, Huanan; Li, Sam Fong Yau
2014-03-01
To overcome the challenging task to select an appropriate pathlength for wastewater chemical oxygen demand (COD) monitoring with high accuracy by UV-vis spectroscopy in wastewater treatment process, a variable pathlength approach combined with partial-least squares regression (PLSR) was developed in this study. Two new strategies were proposed to extract relevant information of UV-vis spectral data from variable pathlength measurements. The first strategy was by data fusion with two data fusion levels: low-level data fusion (LLDF) and mid-level data fusion (MLDF). Predictive accuracy was found to improve, indicated by the lower root-mean-square errors of prediction (RMSEP) compared with those obtained for single pathlength measurements. Both fusion levels were found to deliver very robust PLSR models with residual predictive deviations (RPD) greater than 3 (i.e. 3.22 and 3.29, respectively). The second strategy involved calculating the slopes of absorbance against pathlength at each wavelength to generate slope-derived spectra. Without the requirement to select the optimal pathlength, the predictive accuracy (RMSEP) was improved by 20-43% as compared to single pathlength spectroscopy. Comparing to nine-factor models from fusion strategy, the PLSR model from slope-derived spectroscopy was found to be more parsimonious with only five factors and more robust with residual predictive deviation (RPD) of 3.72. It also offered excellent correlation of predicted and measured COD values with R(2) of 0.936. In sum, variable pathlength spectroscopy with the two proposed data analysis strategies proved to be successful in enhancing prediction performance of COD in wastewater and showed high potential to be applied in on-line water quality monitoring. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Linear models: permutation methods
Cade, B.S.; Everitt, B.S.; Howell, D.C.
2005-01-01
Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
A theoretical model to describe progressions and regressions for exercise rehabilitation.
Blanchard, Sam; Glasgow, Phil
2014-08-01
This article aims to describe a new theoretical model to simplify and aid visualisation of the clinical reasoning process involved in progressing a single exercise. Exercise prescription is a core skill for physiotherapists but is an area that is lacking in theoretical models to assist clinicians when designing exercise programs to aid rehabilitation from injury. Historical models of periodization and motor learning theories lack any visual aids to assist clinicians. The concept of the proposed model is that new stimuli can be added or exchanged with other stimuli, either intrinsic or extrinsic to the participant, in order to gradually progress an exercise whilst remaining safe and effective. The proposed model maintains the core skills of physiotherapists by assisting clinical reasoning skills, exercise prescription and goal setting. It is not limited to any one pathology or rehabilitation setting and can adapted by any level of skilled clinician. Copyright © 2014 Elsevier Ltd. All rights reserved.
Regression of altitude-produced cardiac hypertrophy.
NASA Technical Reports Server (NTRS)
Sizemore, D. A.; Mcintyre, T. W.; Van Liere, E. J.; Wilson , M. F.
1973-01-01
The rate of regression of cardiac hypertrophy with time has been determined in adult male albino rats. The hypertrophy was induced by intermittent exposure to simulated high altitude. The percentage hypertrophy was much greater (46%) in the right ventricle than in the left (16%). The regression could be adequately fitted to a single exponential function with a half-time of 6.73 plus or minus 0.71 days (90% CI). There was no significant difference in the rates of regression for the two ventricles.
Housework: Cause and consequence of gender ideology?
Carlson, Daniel L; Lynch, Jamie L
2013-11-01
Nearly all quantitative studies examining the association between the division of housework and gender ideology have found that gender egalitarianism results in less housework for wives, more for husbands, and more equal sharing of housework by couples. However, a few studies suggest housework has a nontrivial influence on gender ideology. An overreliance on single-direction, single-equation regression models and cross-sectional data has limited past research from making strong claims about the causal relationship between gender ideology and housework. We use data on married couples from Waves 1 and 2 of the National Survey of Families and Households and nonrecursive simultaneous equation models to assess the causal relationship between housework and gender ideology. Results show a mutual and reciprocal relationship between the division of housework and gender ideology for both husbands' and wives'. Reciprocity is strongest for husbands while for wives the relationship is partially indirect and mediated through their husbands' gender ideologies. Copyright © 2013 Elsevier Inc. All rights reserved.
Huen, Jenny M Y; Ip, Brian Y T; Ho, Samuel M Y; Yip, Paul S F
2015-01-01
The present study investigated whether hope and hopelessness are better conceptualized as a single construct of bipolar spectrum or two distinct constructs and whether hope can moderate the relationship between hopelessness and suicidal ideation. Hope, hopelessness, and suicidal ideation were measured in a community sample of 2106 participants through a population-based household survey. Confirmatory factor analyses showed that a measurement model with separate, correlated second-order factors of hope and hopelessness provided a good fit to the data and was significantly better than that of the model collapsing hope and hopelessness into a single second-order factor. Negative binomial regression showed that hope and hopelessness interacted such that the effect of hopelessness on suicidal ideation was lower in individuals with higher hope than individuals with lower hope. Hope and hopelessness are two distinct but correlated constructs. Hope can act as a resilience factor that buffers the impact of hopelessness on suicidal ideation. Inducing hope in people may be a promising avenue for suicide prevention.
Hosking, Jonathan; Gibson, Colin
2016-07-01
The introduction of a single point referral system that prioritises clients depending on case complexity and overcomes the need for re-admittance to a waiting list via a review system has been shown to significantly reduce maximum waiting times for a Posture and Mobility (Special Seating) Service from 102.0 ± 24.33 weeks to 19.2 ± 8.57 weeks (p = 0.015). Using this service model linear regression revealed a statistically significant improvement in the performance outcome of prescribed seating solutions with shorter Episode of Care completion times (p = 0.023). In addition, the number of Episodes of Care completed per annum was significantly related to the Episode of Care completion time (p = 0.019). In conclusion, it is recommended that it may be advantageous to apply this service model to other assistive technology services in order to reduce waiting times and to improve clinical outcomes.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
NASA Astrophysics Data System (ADS)
Xing, Wanqiu; Wang, Weiguang; Shao, Quanxi; Yong, Bin
2018-01-01
Quantifying precipitation (P) partition into evapotranspiration (E) and runoff (Q) is of great importance for global and regional water availability assessment. Budyko framework serves as a powerful tool to make simple and transparent estimation for the partition, using a single parameter, to characterize the shape of the Budyko curve for a "specific basin", where the single parameter reflects the overall effect by not only climatic seasonality, catchment characteristics (e.g., soil, topography and vegetation) but also agricultural activities (e.g., cultivation and irrigation). At the regional scale, these influencing factors are interconnected, and the interactions between them can also affect the single parameter of Budyko-type equations' estimating. Here we employ the multivariate adaptive regression splines (MARS) model to estimate the Budyko curve shape parameter (n in the Choudhury's equation, one form of the Budyko framework) of the selected 96 catchments across China using a data set of long-term averages for climatic seasonality, catchment characteristics and agricultural activities. Results show average storm depth (ASD), vegetation coverage (M), and seasonality index of precipitation (SI) are three statistically significant factors affecting the Budyko parameter. More importantly, four pairs of interactions are recognized by the MARS model as: The interaction between CA (percentage of cultivated land area to total catchment area) and ASD shows that the cultivation can weaken the reducing effect of high ASD (>46.78 mm) on the Budyko parameter estimating. Drought (represented by the value of Palmer drought severity index < -0.74) and uneven distribution of annual rainfall (represented by the value of coefficient of variation of precipitation > 0.23) tend to enhance the Budyko parameter reduction by large SI (>0.797). Low vegetation coverage (34.56%) is likely to intensify the rising effect on evapotranspiration ratio by IA (percentage of irrigation area to total catchment area). The Budyko n values estimated by the MARS model reproduce the calculated ones by the observation well for the selected 96 catchments (with R = 0.817, MAE = 4.09). Compared to the multiple stepwise regression model estimating the parameter n taken the influencing factors as independent inputs, the MARS model enhances the capability of the Budyko framework for assessing water availability at regional scale using readily available data.
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A
2015-05-09
Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
Klimovskaia, Anna; Ganscha, Stefan; Claassen, Manfred
2016-12-01
Stochastic chemical reaction networks constitute a model class to quantitatively describe dynamics and cell-to-cell variability in biological systems. The topology of these networks typically is only partially characterized due to experimental limitations. Current approaches for refining network topology are based on the explicit enumeration of alternative topologies and are therefore restricted to small problem instances with almost complete knowledge. We propose the reactionet lasso, a computational procedure that derives a stepwise sparse regression approach on the basis of the Chemical Master Equation, enabling large-scale structure learning for reaction networks by implicitly accounting for billions of topology variants. We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the complete TRAIL induced apoptosis signaling cascade comprising 70 reactions. We find that the reactionet lasso is able to efficiently recover the structure of these reaction systems, ab initio, with high sensitivity and specificity. With only < 1% false discoveries, the reactionet lasso is able to recover 45% of all true reactions ab initio among > 6000 possible reactions and over 102000 network topologies. In conjunction with information rich single cell technologies such as single cell RNA sequencing or mass cytometry, the reactionet lasso will enable large-scale structure learning, particularly in areas with partial network structure knowledge, such as cancer biology, and thereby enable the detection of pathological alterations of reaction networks. We provide software to allow for wide applicability of the reactionet lasso.
Yuan, XiaoDong; Tang, Wei; Shi, WenWei; Yu, Libao; Zhang, Jing; Yuan, Qing; You, Shan; Wu, Ning; Ao, Guokun; Ma, Tingting
2018-07-01
To develop a convenient and rapid single-kidney CT-GFR technique. One hundred and twelve patients referred for multiphasic renal CT and 99mTc-DTPA renal dynamic imaging Gates-GFR measurement were prospectively included and randomly divided into two groups of 56 patients each: the training group and the validation group. On the basis of the nephrographic phase images, the fractional renal accumulation (FRA) was calculated and correlated with the Gates-GFR in the training group. From this correlation a formula was derived for single-kidney CT-GFR calculation, which was validated by a paired t test and linear regression analysis with the single-kidney Gates-GFR in the validation group. In the training group, the FRA (x-axis) correlated well (r = 0.95, p < 0.001) with single-kidney Gates-GFR (y-axis), producing a regression equation of y = 1665x + 1.5 for single-kidney CT-GFR calculation. In the validation group, the difference between the methods of single-kidney GFR measurements was 0.38 ± 5.57 mL/min (p = 0.471); the regression line is identical to the diagonal (intercept = 0 and slope = 1) (p = 0.727 and p = 0.473, respectively), with a standard deviation of residuals of 5.56 mL/min. A convenient and rapid single-kidney CT-GFR technique was presented and validated in this investigation. • The new CT-GFR method takes about 2.5 min of patient time. • The CT-GFR method demonstrated identical results to the Gates-GFR method. • The CT-GFR method is based on the fractional renal accumulation of iodinated CM. • The CT-GFR method is achieved without additional radiation dose to the patient.
NASA Astrophysics Data System (ADS)
Skrzypek, Grzegorz; Sadler, Rohan; Wiśniewski, Andrzej
2017-04-01
The stable oxygen isotope composition of phosphates (δ18O) extracted from mammalian bone and teeth material is commonly used as a proxy for paleotemperature. Historically, several different analytical and statistical procedures for determining air paleotemperatures from the measured δ18O of phosphates have been applied. This inconsistency in both stable isotope data processing and the application of statistical procedures has led to large and unwanted differences between calculated results. This study presents the uncertainty associated with two of the most commonly used regression methods: least squares inverted fit and transposed fit. We assessed the performance of these methods by designing and applying calculation experiments to multiple real-life data sets, calculating in reverse temperatures, and comparing them with true recorded values. Our calculations clearly show that the mean absolute errors are always substantially higher for the inverted fit (a causal model), with the transposed fit (a predictive model) returning mean values closer to the measured values (Skrzypek et al. 2015). The predictive models always performed better than causal models, with 12-65% lower mean absolute errors. Moreover, the least-squares regression (LSM) model is more appropriate than Reduced Major Axis (RMA) regression for calculating the environmental water stable oxygen isotope composition from phosphate signatures, as well as for calculating air temperature from the δ18O value of environmental water. The transposed fit introduces a lower overall error than the inverted fit for both the δ18O of environmental water and Tair calculations; therefore, the predictive models are more statistically efficient than the causal models in this instance. The direct comparison of paleotemperature results from different laboratories and studies may only be achieved if a single method of calculation is applied. Reference Skrzypek G., Sadler R., Wiśniewski A., 2016. Reassessment of recommendations for processing mammal phosphate δ18O data for paleotemperature reconstruction. Palaeogeography, Palaeoclimatology, Palaeoecology 446, 162-167.
Statistical Approaches for Spatiotemporal Prediction of Low Flows
NASA Astrophysics Data System (ADS)
Fangmann, A.; Haberlandt, U.
2017-12-01
An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.
Modeling soil parameters using hyperspectral image reflectance in subtropical coastal wetlands
NASA Astrophysics Data System (ADS)
Anne, Naveen J. P.; Abd-Elrahman, Amr H.; Lewis, David B.; Hewitt, Nicole A.
2014-12-01
Developing spectral models of soil properties is an important frontier in remote sensing and soil science. Several studies have focused on modeling soil properties such as total pools of soil organic matter and carbon in bare soils. We extended this effort to model soil parameters in areas densely covered with coastal vegetation. Moreover, we investigated soil properties indicative of soil functions such as nutrient and organic matter turnover and storage. These properties include the partitioning of mineral and organic soil between particulate (>53 μm) and fine size classes, and the partitioning of soil carbon and nitrogen pools between stable and labile fractions. Soil samples were obtained from Avicennia germinans mangrove forest and Juncus roemerianus salt marsh plots on the west coast of central Florida. Spectra corresponding to field plot locations from Hyperion hyperspectral image were extracted and analyzed. The spectral information was regressed against the soil variables to determine the best single bands and optimal band combinations for the simple ratio (SR) and normalized difference index (NDI) indices. The regression analysis yielded levels of correlation for soil variables with R2 values ranging from 0.21 to 0.47 for best individual bands, 0.28 to 0.81 for two-band indices, and 0.53 to 0.96 for partial least-squares (PLS) regressions for the Hyperion image data. Spectral models using Hyperion data adequately (RPD > 1.4) predicted particulate organic matter (POM), silt + clay, labile carbon (C), and labile nitrogen (N) (where RPD = ratio of standard deviation to root mean square error of cross-validation [RMSECV]). The SR (0.53 μm, 2.11 μm) model of labile N with R2 = 0.81, RMSECV= 0.28, and RPD = 1.94 produced the best results in this study. Our results provide optimism that remote-sensing spectral models can successfully predict soil properties indicative of ecosystem nutrient and organic matter turnover and storage, and do so in areas with dense canopy cover.
Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.
Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan
2016-11-01
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects. Copyright © 2016 Crop Science Society of America.
Linked Lives: Adult Children's Problems and Their Parents' Psychological and Relational Well-Being
Greenfield, Emily A.; Marks, Nadine F.
2006-01-01
This study examined associations between adult children's cumulative problems and their parents' psychological and relational well-being, as well as whether such associations are similar for married and single parents. Regression models were estimated using data from 1,188 parents in the 1995 National Survey of Midlife in the United States whose youngest child was at least 19 years old. Participants reporting children with more problems indicated moderately poorer levels of well-being across all outcomes examined. Single parents reporting more problems indicated less positive affect than a comparable group of married parents, but married parents reporting more problems indicated poorer parent-child relationship quality. Findings are congruent with the family life course perspective, conceptualizing parents and children as occupying mutually influential developmental trajectories. PMID:17710218
Evaluation of weighted regression and sample size in developing a taper model for loblolly pine
Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold
1992-01-01
A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco
2017-10-01
The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.
Mathur, P K; Herrero-Medrano, J M; Alexandri, P; Knol, E F; ten Napel, J; Rashidi, H; Mulder, H A
2014-12-01
A method was developed and tested to estimate challenge load due to disease outbreaks and other challenges in sows using reproduction records. The method was based on reproduction records from a farm with known disease outbreaks. It was assumed that the reduction in weekly reproductive output within a farm is proportional to the magnitude of the challenge. As the challenge increases beyond certain threshold, it is manifested as an outbreak. The reproduction records were divided into 3 datasets. The first dataset called the Training dataset consisted of 57,135 reproduction records from 10,901 sows from 1 farm in Canada with several outbreaks of porcine reproductive and respiratory syndrome (PRRS). The known disease status of sows was regressed on the traits number born alive, number of losses as a combination of still birth and mummified piglets, and number of weaned piglets. The regression coefficients from this analysis were then used as weighting factors for derivation of an index measure called challenge load indicator. These weighting factors were derived with i) a two-step approach using residuals or year-week solutions estimated from a previous step, and ii) a single-step approach using the trait values directly. Two types of models were used for each approach: a logistic regression model and a general additive model. The estimates of challenge load indicator were then compared based on their ability to detect PRRS outbreaks in a Test dataset consisting of records from 65,826 sows from 15 farms in the Netherlands. These farms differed from the Canadian farm with respect to PRRS virus strains, severity and frequency of outbreaks. The single-step approach using a general additive model was best and detected 14 out of the 15 outbreaks. This approach was then further validated using the third dataset consisting of reproduction records of 831,855 sows in 431 farms located in different countries in Europe and America. A total of 41 out of 48 outbreaks detected using data analysis were confirmed based on diagnostic information received from the farms. Among these, 30 outbreaks were due to PRRS while 11 were due to other diseases and challenging conditions. The results suggest that proposed method could be useful for estimation of challenge load and detection of challenge phases such as disease outbreaks.
NASA Astrophysics Data System (ADS)
Hofer, Marlis; MöLg, Thomas; Marzeion, Ben; Kaser, Georg
2010-06-01
Recently initiated observation networks in the Cordillera Blanca (Peru) provide temporally high-resolution, yet short-term, atmospheric data. The aim of this study is to extend the existing time series into the past. We present an empirical-statistical downscaling (ESD) model that links 6-hourly National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data to air temperature and specific humidity, measured at the tropical glacier Artesonraju (northern Cordillera Blanca). The ESD modeling procedure includes combined empirical orthogonal function and multiple regression analyses and a double cross-validation scheme for model evaluation. Apart from the selection of predictor fields, the modeling procedure is automated and does not include subjective choices. We assess the ESD model sensitivity to the predictor choice using both single-field and mixed-field predictors. Statistical transfer functions are derived individually for different months and times of day. The forecast skill largely depends on month and time of day, ranging from 0 to 0.8. The mixed-field predictors perform better than the single-field predictors. The ESD model shows added value, at all time scales, against simpler reference models (e.g., the direct use of reanalysis grid point values). The ESD model forecast 1960-2008 clearly reflects interannual variability related to the El Niño/Southern Oscillation but is sensitive to the chosen predictor type.
The association of genetic variants of type 2 diabetes with kidney function.
Franceschini, Nora; Shara, Nawar M; Wang, Hong; Voruganti, V Saroja; Laston, Sandy; Haack, Karin; Lee, Elisa T; Best, Lyle G; Maccluer, Jean W; Cochran, Barbara J; Dyer, Thomas D; Howard, Barbara V; Cole, Shelley A; North, Kari E; Umans, Jason G
2012-07-01
Type 2 diabetes is highly prevalent and is the major cause of progressive chronic kidney disease in American Indians. Genome-wide association studies identified several loci associated with diabetes but their impact on susceptibility to diabetic complications is unknown. We studied the association of 18 type 2 diabetes genome-wide association single-nucleotide polymorphisms (SNPs) with estimated glomerular filtration rate (eGFR; MDRD equation) and urine albumin-to-creatinine ratio in 6958 Strong Heart Study family and cohort participants. Center-specific residuals of eGFR and log urine albumin-to-creatinine ratio, obtained from linear regression models adjusted for age, sex, and body mass index, were regressed onto SNP dosage using variance component models in family data and linear regression in unrelated individuals. Estimates were then combined across centers. Four diabetic loci were associated with eGFR and one locus with urine albumin-to-creatinine ratio. A SNP in the WFS1 gene (rs10010131) was associated with higher eGFR in younger individuals and with increased albuminuria. SNPs in the FTO, KCNJ11, and TCF7L2 genes were associated with lower eGFR, but not albuminuria, and were not significant in prospective analyses. Our findings suggest a shared genetic risk for type 2 diabetes and its kidney complications, and a potential role for WFS1 in early-onset diabetic nephropathy in American Indian populations.
He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Han, Dandan; Xu, Pao; Yang, Runqing
2017-11-02
Because of their high economic importance, growth traits in fish are under continuous improvement. For growth traits that are recorded at multiple time-points in life, the use of univariate and multivariate animal models is limited because of the variable and irregular timing of these measures. Thus, the univariate random regression model (RRM) was introduced for the genetic analysis of dynamic growth traits in fish breeding. We used a multivariate random regression model (MRRM) to analyze genetic changes in growth traits recorded at multiple time-point of genetically-improved farmed tilapia. Legendre polynomials of different orders were applied to characterize the influences of fixed and random effects on growth trajectories. The final MRRM was determined by optimizing the univariate RRM for the analyzed traits separately via penalizing adaptively the likelihood statistical criterion, which is superior to both the Akaike information criterion and the Bayesian information criterion. In the selected MRRM, the additive genetic effects were modeled by Legendre polynomials of three orders for body weight (BWE) and body length (BL) and of two orders for body depth (BD). By using the covariance functions of the MRRM, estimated heritabilities were between 0.086 and 0.628 for BWE, 0.155 and 0.556 for BL, and 0.056 and 0.607 for BD. Only heritabilities for BD measured from 60 to 140 days of age were consistently higher than those estimated by the univariate RRM. All genetic correlations between growth time-points exceeded 0.5 for either single or pairwise time-points. Moreover, correlations between early and late growth time-points were lower. Thus, for phenotypes that are measured repeatedly in aquaculture, an MRRM can enhance the efficiency of the comprehensive selection for BWE and the main morphological traits.
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.
Hewitt, Angela L.; Popa, Laurentiu S.; Pasalar, Siavash; Hendrix, Claudia M.
2011-01-01
Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of Radj2), followed by position (28 ± 24% of Radj2) and speed (11 ± 19% of Radj2). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower Radj2 values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics. PMID:21795616
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.
Teoh, Shao Thing; Kitamura, Miki; Nakayama, Yasumune; Putri, Sastia; Mukai, Yukio; Fukusaki, Eiichiro
2016-08-01
In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
NASA Astrophysics Data System (ADS)
Danner, Travis W.
Developing technology systems requires all manner of investment---engineering talent, prototypes, test facilities, and more. Even for simple design problems the investment can be substantial; for complex technology systems, the development costs can be staggering. The profitability of a corporation in a technology-driven industry is crucially dependent on maximizing the effectiveness of research and development investment. Decision-makers charged with allocation of this investment are forced to choose between the further evolution of existing technologies and the pursuit of revolutionary technologies. At risk on the one hand is excessive investment in an evolutionary technology which has only limited availability for further improvement. On the other hand, the pursuit of a revolutionary technology may mean abandoning momentum and the potential for substantial evolutionary improvement resulting from the years of accumulated knowledge. The informed answer to this question, evolutionary or revolutionary, requires knowledge of the expected rate of improvement and the potential a technology offers for further improvement. This research is dedicated to formulating the assessment and forecasting tools necessary to acquire this knowledge. The same physical laws and principles that enable the development and improvement of specific technologies also limit the ultimate capability of those technologies. Researchers have long used this concept as the foundation for modeling technological advancement through extrapolation by analogy to biological growth models. These models are employed to depict technology development as it asymptotically approaches limits established by the fundamental principles on which the technological approach is based. This has proven an effective and accurate approach to modeling and forecasting simple single-attribute technologies. With increased system complexity and the introduction of multiple system objectives, however, the usefulness of this modeling technique begins to diminish. With the introduction of multiple objectives, researchers often abandon technology growth models for scoring models and technology frontiers. While both approaches possess advantages over current growth models for the assessment of multi-objective technologies, each lacks a necessary dimension for comprehensive technology assessment. By collapsing multiple system metrics into a single, non-intuitive technology measure, scoring models provide a succinct framework for multi-objective technology assessment and forecasting. Yet, with no consideration of physical limits, scoring models provide no insight as to the feasibility of a particular combination of system capabilities. They only indicate that a given combination of system capabilities yields a particular score. Conversely, technology frontiers are constructed with the distinct objective of providing insight into the feasibility of system capability combinations. Yet again, upper limits to overall system performance are ignored. Furthermore, the data required to forecast subsequent technology frontiers is often inhibitive. In an attempt to reincorporate the fundamental nature of technology advancement as bound by physical principles, researchers have sought to normalize multi-objective systems whereby the variability of a single system objective is eliminated as a result of changes in the remaining objectives. This drastically limits the applicability of the resulting technology model because it is only applicable for a single setting of all other system attributes. Attempts to maintain the interaction between the growth curves of each technical objective of a complex system have thus far been limited to qualitative and subjective consideration. This research proposes the formulation of multidimensional growth models as an approach to simulating the advancement of multi-objective technologies towards their upper limits. Multidimensional growth models were formulated by noticing and exploiting the correlation between technology growth models and technology frontiers. Both are frontiers in actuality. The technology growth curve is a frontier between capability levels of a single attribute and time, while a technology frontier is a frontier between the capability levels of two or more attributes. Multidimensional growth models are formulated by exploiting the mathematical significance of this correlation. The result is a model that can capture both the interaction between multiple system attributes and their expected rates of improvement over time. The fundamental nature of technology development is maintained, and interdependent growth curves are generated for each system metric with minimal data requirements. Being founded on the basic nature of technology advancement, relative to physical limits, the availability for further improvement can be determined for a single metric relative to other system measures of merit. A by-product of this modeling approach is a single n-dimensional technology frontier linking all n system attributes with time. This provides an environment capable of forecasting future system capability in the form of advancing technology frontiers. The ability of a multidimensional growth model to capture the expected improvement of a specific technological approach is dependent on accurately identifying the physical limitations to each pertinent attribute. This research investigates two potential approaches to identifying those physical limits, a physics-based approach and a regression-based approach. The regression-based approach has found limited acceptance among forecasters, although it does show potential for estimating upper limits with a specified degree of uncertainty. Forecasters have long favored physics-based approaches for establishing the upper limit to unidimensional growth models. The task of accurately identifying upper limits has become increasingly difficult with the extension of growth models into multiple dimensions. A lone researcher may be able to identify the physical limitation to a single attribute of a simple system; however, as system complexity and the number of attributes increases, the attention of researchers from multiple fields of study is required. Thus, limit identification is itself an area of research and development requiring some level of investment. Whether estimated by physics or regression-based approaches, predicted limits will always have some degree of uncertainty. This research takes the approach of quantifying the impact of that uncertainty on model forecasts rather than heavily endorsing a single technique to limit identification. In addition to formulating the multidimensional growth model, this research provides a systematic procedure for applying that model to specific technology architectures. Researchers and decision-makers are able to investigate the potential for additional improvement within that technology architecture and to estimate the expected cost of each incremental improvement relative to the cost of past improvements. In this manner, multidimensional growth models provide the necessary information to set reasonable program goals for the further evolution of a particular technological approach or to establish the need for revolutionary approaches in light of the constraining limits of conventional approaches.
Bittencourt, Natalia F N; Ocarino, Juliana M; Mendonça, Luciana D M; Hewett, Timothy E; Fonseca, Sergio T
2012-12-01
Cross-sectional. To investigate predictors of increased frontal plane knee projection angle (FPKPA) in athletes. The underlying mechanisms that lead to increased FPKPA are likely multifactorial and depend on how the musculoskeletal system adapts to the possible interactions between its distal and proximal segments. Bivariate and linear analyses traditionally employed to analyze the occurrence of increased FPKPA are not sufficiently robust to capture complex relationships among predictors. The investigation of nonlinear interactions among biomechanical factors is necessary to further our understanding of the interdependence of lower-limb segments and resultant dynamic knee alignment. The FPKPA was assessed in 101 athletes during a single-leg squat and in 72 athletes at the moment of landing from a jump. The investigated predictors were sex, hip abductor isometric torque, passive range of motion (ROM) of hip internal rotation (IR), and shank-forefoot alignment. Classification and regression trees were used to investigate nonlinear interactions among predictors and their influence on the occurrence of increased FPKPA. During single-leg squatting, the occurrence of high FPKPA was predicted by the interaction between hip abductor isometric torque and passive hip IR ROM. At the moment of landing, the shank-forefoot alignment, abductor isometric torque, and passive hip IR ROM were predictors of high FPKPA. In addition, the classification and regression trees established cutoff points that could be used in clinical practice to identify athletes who are at potential risk for excessive FPKPA. The models captured nonlinear interactions between hip abductor isometric torque, passive hip IR ROM, and shank-forefoot alignment.
Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui
2016-01-01
Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365
Influence of salinity and temperature on acute toxicity of cadmium to Mysidopsis bahia molenock
DOE Office of Scientific and Technical Information (OSTI.GOV)
Voyer, R.A.; Modica, G.
1990-01-01
Acute toxicity tests were conducted to compare estimates of toxicity, as modified by salinity and temperature, based on response surface techniques with those derived using conventional test methods, and to compare effect of a single episodic exposure to cadmium as a function of salinity with that of continuous exposure. Regression analysis indicated that mortality following continuous 96-hr exposure is related to linear and quadratic effects of salinity and cadmium at 20 C, and to the linear and quadratic effects of cadmium only at 25C. LC50s decreased with increases in temperature and decreases in salinity. Based on the regression model developed,more » 96-hr LC50s ranged from 15.5 to 28.0 micro Cd/L at 10 and 30% salinities, respectively, at 25C; and from 47 to 85 microgram Cd/L at these salinities at 20C.« less
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
New Approach To Hour-By-Hour Weather Forecast
NASA Astrophysics Data System (ADS)
Liao, Q. Q.; Wang, B.
2017-12-01
Fine hourly forecast in single station weather forecast is required in many human production and life application situations. Most previous MOS (Model Output Statistics) which used a linear regression model are hard to solve nonlinear natures of the weather prediction and forecast accuracy has not been sufficient at high temporal resolution. This study is to predict the future meteorological elements including temperature, precipitation, relative humidity and wind speed in a local region over a relatively short period of time at hourly level. By means of hour-to-hour NWP (Numeral Weather Prediction)meteorological field from Forcastio (https://darksky.net/dev/docs/forecast) and real-time instrumental observation including 29 stations in Yunnan and 3 stations in Tianjin of China from June to October 2016, predictions are made of the 24-hour hour-by-hour ahead. This study presents an ensemble approach to combine the information of instrumental observation itself and NWP. Use autoregressive-moving-average (ARMA) model to predict future values of the observation time series. Put newest NWP products into the equations derived from the multiple linear regression MOS technique. Handle residual series of MOS outputs with autoregressive (AR) model for the linear property presented in time series. Due to the complexity of non-linear property of atmospheric flow, support vector machine (SVM) is also introduced . Therefore basic data quality control and cross validation makes it able to optimize the model function parameters , and do 24 hours ahead residual reduction with AR/SVM model. Results show that AR model technique is better than corresponding multi-variant MOS regression method especially at the early 4 hours when the predictor is temperature. MOS-AR combined model which is comparable to MOS-SVM model outperform than MOS. Both of their root mean square error and correlation coefficients for 2 m temperature are reduced to 1.6 degree Celsius and 0.91 respectively. The forecast accuracy of 24- hour forecast deviation no more than 2 degree Celsius is 78.75 % for MOS-AR model and 81.23 % for AR model.
Phadnis, Milind A.; Shireman, Theresa I.; Wetmore, James B.; Rigler, Sally K.; Zhou, Xinhua; Spertus, John A.; Ellerbeck, Edward F.; Mahnken, Jonathan D.
2014-01-01
In a population of chronic dialysis patients with an extensive burden of cardiovascular disease, estimation of the effectiveness of cardioprotective medication in literature is based on calculation of a hazard ratio comparing hazard of mortality for two groups (with or without drug exposure) measured at a single point in time or through the cumulative metric of proportion of days covered (PDC) on medication. Though both approaches can be modeled in a time-dependent manner using a Cox regression model, we propose a more complete time-dependent metric for evaluating cardioprotective medication efficacy. We consider that drug effectiveness is potentially the result of interactions between three time-dependent covariate measures, current drug usage status (ON versus OFF), proportion of cumulative exposure to drug at a given point in time, and the patient’s switching behavior between taking and not taking the medication. We show that modeling of all three of these time-dependent measures illustrates more clearly how varying patterns of drug exposure affect drug effectiveness, which could remain obscured when modeled by the more standard single time-dependent covariate approaches. We propose that understanding the nature and directionality of these interactions will help the biopharmaceutical industry in better estimating drug efficacy. PMID:25343005
Phadnis, Milind A; Shireman, Theresa I; Wetmore, James B; Rigler, Sally K; Zhou, Xinhua; Spertus, John A; Ellerbeck, Edward F; Mahnken, Jonathan D
2014-01-01
In a population of chronic dialysis patients with an extensive burden of cardiovascular disease, estimation of the effectiveness of cardioprotective medication in literature is based on calculation of a hazard ratio comparing hazard of mortality for two groups (with or without drug exposure) measured at a single point in time or through the cumulative metric of proportion of days covered (PDC) on medication. Though both approaches can be modeled in a time-dependent manner using a Cox regression model, we propose a more complete time-dependent metric for evaluating cardioprotective medication efficacy. We consider that drug effectiveness is potentially the result of interactions between three time-dependent covariate measures, current drug usage status (ON versus OFF), proportion of cumulative exposure to drug at a given point in time, and the patient's switching behavior between taking and not taking the medication. We show that modeling of all three of these time-dependent measures illustrates more clearly how varying patterns of drug exposure affect drug effectiveness, which could remain obscured when modeled by the more standard single time-dependent covariate approaches. We propose that understanding the nature and directionality of these interactions will help the biopharmaceutical industry in better estimating drug efficacy.
Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold
2015-03-01
A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Drzewiecki, Wojciech
2017-12-01
We evaluated the performance of nine machine learning regression algorithms and their ensembles for sub-pixel estimation of impervious areas coverages from Landsat imagery. The accuracy of imperviousness mapping in individual time points was assessed based on RMSE, MAE and R2. These measures were also used for the assessment of imperviousness change intensity estimations. The applicability for detection of relevant changes in impervious areas coverages at sub-pixel level was evaluated using overall accuracy, F-measure and ROC Area Under Curve. The results proved that Cubist algorithm may be advised for Landsat-based mapping of imperviousness for single dates. Stochastic gradient boosting of regression trees (GBM) may be also considered for this purpose. However, Random Forest algorithm is endorsed for both imperviousness change detection and mapping of its intensity. In all applications the heterogeneous model ensembles performed at least as well as the best individual models or better. They may be recommended for improving the quality of sub-pixel imperviousness and imperviousness change mapping. The study revealed also limitations of the investigated methodology for detection of subtle changes of imperviousness inside the pixel. None of the tested approaches was able to reliably classify changed and non-changed pixels if the relevant change threshold was set as one or three percent. Also for fi ve percent change threshold most of algorithms did not ensure that the accuracy of change map is higher than the accuracy of random classifi er. For the threshold of relevant change set as ten percent all approaches performed satisfactory.
Real-time soil sensing based on fiber optics and spectroscopy
NASA Astrophysics Data System (ADS)
Li, Minzan
2005-08-01
Using NIR spectroscopic techniques, correlation analysis and regression analysis for soil parameter estimation was conducted with raw soil samples collected in a cornfield and a forage field. Soil parameters analyzed were soil moisture, soil organic matter, nitrate nitrogen, soil electrical conductivity and pH. Results showed that all soil parameters could be evaluated by NIR spectral reflectance. For soil moisture, a linear regression model was available at low moisture contents below 30 % db, while an exponential model can be used in a wide range of moisture content up to 100 % db. Nitrate nitrogen estimation required a multi-spectral exponential model and electrical conductivity could be evaluated by a single spectral regression. According to the result above mentioned, a real time soil sensor system based on fiber optics and spectroscopy was developed. The sensor system was composed of a soil subsoiler with four optical fiber probes, a spectrometer, and a control unit. Two optical fiber probes were used for illumination and the other two optical fiber probes for collecting soil reflectance from visible to NIR wavebands at depths around 30 cm. The spectrometer was used to obtain the spectra of reflected lights. The control unit consisted of a data logging device, a personal computer, and a pulse generator. The experiment showed that clear photo-spectral reflectance was obtained from the underground soil. The soil reflectance was equal to that obtained by the desktop spectrophotometer in laboratory tests. Using the spectral reflectance, the soil parameters, such as soil moisture, pH, EC and SOM, were evaluated.
Fleischer, Adam E; Hshieh, Shenche; Crews, Ryan T; Waverly, Brett J; Jones, Jacob M; Klein, Erin E; Weil, Lowell; Weil, Lowell Scott
2018-05-01
Metatarsal length is believed to play a role in plantar plate dysfunction, although the mechanism through which progressive injury occurs is still uncertain. We aimed to clarify whether length of the second metatarsal was associated with increased plantar pressure measurements in the forefoot while walking. Weightbearing radiographs and corresponding pedobarographic data from 100 patients in our practice walking without a limp were retrospectively reviewed. Radiographs were assessed for several anatomic relationships, including metatarsal length, by a single rater. Pearson correlation analyses and multiple linear regression models were used to determine whether metatarsal length was associated with forefoot loading parameters. The relative length of the second to first metatarsal was positively associated with the ratio of peak pressure beneath the respective metatarsophalangeal joints ( r = 0.243, P = .015). The relative length of the second to third metatarsal was positively associated with the ratios of peak pressure ( r = 0.292, P = .003), pressure-time integral ( r = 0.249, P = .013), and force-time integral ( r = 0.221, P = .028) beneath the respective metatarsophalangeal joints. Although the variability in loading predicted by the various regression analyses was not large (4%-14%), the relative length of the second metatarsal (to the first and to the third) was maintained in each of the multiple regression models and remained the strongest predictor (highest standardized β-coefficient) in each of the models. Patients with longer second metatarsals exhibited relatively higher loads beneath the second metatarsophalangeal joint during barefoot walking. These findings provide a mechanism through which elongated second metatarsals may contribute to plantar plate injuries. Level III, comparative study.
Anti-TNF levels in cord blood at birth are associated with anti-TNF type.
Kanis, Shannon L; de Lima, Alison; van der Ent, Cokkie; Rizopoulos, Dimitris; van der Woude, C Janneke
2018-05-15
Pregnancy guidelines for women with Inflammatory Bowel Disease (IBD) provide recommendations regarding anti-TNF cessation during pregnancy, in order to limit fetal exposure. Although infliximab (IFX) leads to higher anti-TNF concentrations in cord blood than adalimumab (ADA), recommendations are similar. We aimed to demonstrate the effect of anti-TNF cessation during pregnancy on fetal exposure, for IFX and ADA separately. We conducted a prospective single center cohort study. Women with IBD, using IFX or ADA, were followed-up during pregnancy. In case of sustained disease remission, anti-TNF was stopped in the third trimester. At birth, anti-TNF concentration was measured in cord blood. A linear regression model was developed to demonstrate anti-TNF concentration in cord blood at birth. In addition, outcomes such as disease activity, pregnancy outcomes and 1-year health outcomes of infants were collected. We included 131 pregnancies that resulted in a live birth (73 IFX, 58 ADA). At birth, 94 cord blood samples were obtained (52 IFX, 42 ADA), showing significantly higher levels of IFX than ADA (p<0.0001). Anti-TNF type and stop week were used in the linear regression model. During the third trimester, IFX transportation over the placenta increases exponentially, however, ADA transportation is limited and increases in a linear fashion. Overall, health outcomes were comparable. Our linear regression model shows that ADA may be continued longer during pregnancy as transportation over the placenta is lower than IFX. This may reduce relapse risk of the mother without increasing fetal anti-TNF exposure.
An Update of the Bodeker Scientific Vertically Resolved, Global, Gap-Free Ozone Database
NASA Astrophysics Data System (ADS)
Kremser, S.; Bodeker, G. E.; Lewis, J.; Hassler, B.
2016-12-01
High vertical resolution ozone measurements from multiple satellite-based instruments have been merged with measurements from the global ozonesonde network to calculate monthly mean ozone values in 5º latitude zones. Ozone number densities and ozone mixing ratios are provided on 70 altitude levels (1 to 70 km) and on 70 pressure levels spaced approximately 1 km apart (878.4 hPa to 0.046 hPa). These data are sparse and do not cover the entire globe or altitude range. To provide a gap-free database, a least squares regression model is fitted to these data and then evaluated globally. By applying a single fit at each level, and using the approach of allowing the regression fits to change only slightly from one level to the next, the regression is less sensitive to measurement anomalies at individual stations or to individual satellite-based instruments. Particular attention is paid to ensuring that the low ozone abundances in the polar regions are captured. This presentation reports on updates to an earlier version of the vertically resolved ozone database, including the incorporation of new ozone measurements and new techniques for combining the data. Compared to previous versions of the database, particular attention is paid to avoiding spatial and temporal sampling biases and tracing uncertainties through to the final product. This updated database, developed within the New Zealand Deep South National Science Challenge, is suitable for assessing ozone fields from chemistry-climate model simulations or for providing the ozone boundary conditions for global climate model simulations that do not treat stratospheric chemistry interactively.
Ward-Kavanagh, Lindsay K.; Zhu, Junjia; Cooper, Timothy K.; Schell, Todd D.
2014-01-01
Adoptive immunotherapy has demonstrated efficacy in a subset of clinical and preclinical studies, but the T cells used for therapy often are rendered rapidly non-functional in tumor-bearing hosts. Recent evidence indicates that prostate cancer can be susceptible to immunotherapy, but most studies using autochthonous tumor models demonstrate only short-lived T-cell responses in the tolerogenic prostate microenvironment. Here, we assessed the efficacy of sublethal whole-body irradiation (WBI) to enhance the magnitude and duration of adoptively transferred CD8+ T cells in the transgenic adenocarcinoma of the mouse prostate (TRAMP) model. We demonstrate that WBI promoted high-level accumulation of granzyme B (GzB)-expressing donor T cells both in lymphoid organs and in the prostate of TRAMP mice. Donor T cells remained responsive to vaccination in irradiated recipients, but a single round of WBI-enhanced adoptive immunotherapy failed to impact significantly the existing disease. Addition of a second round of immunotherapy promoted regression of established disease in half of the treated mice, with no progressions observed. Regression was associated with long-term persistence of effector/memory phenotype CD8+ donor cells. Administration of the second round of adoptive immunotherapy led to reacquisition of GzB expression by persistent T cells from the first transfer. These results indicate that WBI conditioning amplifies tumor-specific T cells in the TRAMP prostate and lymphoid tissue, and suggest that the initial treatment alters the tolerogenic microenvironment to increase antitumor activity by a second wave of donor cells. PMID:24801834
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Ahn, Jaeil; Mukherjee, Bhramar; Banerjee, Mousumi; Cooney, Kathleen A.
2011-01-01
Summary The stereotype regression model for categorical outcomes, proposed by Anderson (1984) is nested between the baseline category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline-category (or multinomial logistic) model due to a product representation of the log odds-ratios in terms of a common parameter corresponding to each predictor and category specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multidimensional in nature. As pointed out by Greenland (1994), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome-stratified sampling as in case-control studies. In addition, for matched case-control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood based testing approaches due to non-linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men’s Health Study, a case-control study of prostate cancer in African-American men aged 40 to 79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastatsis (TNM) as the categorical response of interest. PMID:19731262
Anokye, Nana Kwame; Pokhrel, Subhash; Buxton, Martin; Fox-Rushby, Julia
2013-06-01
Little is known about the correlates of meeting recommended levels of participation in physical activity (PA) and how this understanding informs public health policies on behaviour change. To analyse who meets the recommended level of participation in PA in males and females separately by applying 'process' modelling frameworks (single vs. sequential 2-step process). Using the Health Survey for England 2006, (n = 14 142; ≥ 16 years), gender-specific regression models were estimated using bivariate probit with selectivity correction and single probit models. A 'sequential, 2-step process' modelled participation and meeting the recommended level separately, whereas the 'single process' considered both participation and level together. In females, meeting the recommended level was associated with degree holders [Marginal effect (ME) = 0.013] and age (ME = -0.001), whereas in males, age was a significant correlate (ME = -0.003 to -0.004). The order of importance of correlates was similar across genders, with ethnicity being the most important correlate in both males (ME = -0.060) and females (ME = -0.133). In females, the 'sequential, 2-step process' performed better (ρ = -0.364, P < 0.001) than that in males (ρ = 0.154). The degree to which people undertake the recommended level of PA through vigorous activity varies between males and females, and the process that best predicts such decisions, i.e. whether it is a sequential, 2-step process or a single-step choice, is also different for males and females. Understanding this should help to identify subgroups that are less likely to meet the recommended level of PA (and hence more likely to benefit from any PA promotion intervention).
Integrated Cox's model for predicting survival time of glioblastoma multiforme.
Ai, Zhibing; Li, Longti; Fu, Rui; Lu, Jing-Min; He, Jing-Dong; Li, Sen
2017-04-01
Glioblastoma multiforme is the most common primary brain tumor and is highly lethal. This study aims to figure out signatures for predicting the survival time of patients with glioblastoma multiforme. Clinical information, messenger RNA expression, microRNA expression, and single-nucleotide polymorphism array data of patients with glioblastoma multiforme were retrieved from The Cancer Genome Atlas. Patients were separated into two groups by using 1 year as a cutoff, and a logistic regression model was used to figure out any variables that can predict whether the patient was able to live longer than 1 year. Furthermore, Cox's model was used to find out features that were correlated with the survival time. Finally, a Cox model integrated the significant clinical variables, messenger RNA expression, microRNA expression, and single-nucleotide polymorphism was built. Although the classification method failed, signatures of clinical features, messenger RNA expression levels, and microRNA expression levels were figured out by using Cox's model. However, no single-nucleotide polymorphisms related to prognosis were found. The selected clinical features were age at initial diagnosis, Karnofsky score, and race, all of which had been suggested to correlate with survival time. Both of the two significant microRNAs, microRNA-221 and microRNA-222, were targeted to p27 Kip1 protein, which implied the important role of p27 Kip1 on the prognosis of glioblastoma multiforme patients. Our results suggested that survival modeling was more suitable than classification to figure out prognostic biomarkers for patients with glioblastoma multiforme. An integrated model containing clinical features, messenger RNA levels, and microRNA expression levels was built, which has the potential to be used in clinics and thus to improve the survival status of glioblastoma multiforme patients.
The Plumbing of Land Surface Models: Is Poor Performance a Result of Methodology or Data Quality?
NASA Technical Reports Server (NTRS)
Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.; Or, Dani; Best, Martin J.; Johnson, Helen R.; Balsamo, Gianpaolo; Boone, Aaron; Cuntz, Matthais; Decharme, Bertrand;
2016-01-01
The PALS Land sUrface Model Benchmarking Evaluation pRoject (PLUMBER) illustrated the value of prescribing a priori performance targets in model intercomparisons. It showed that the performance of turbulent energy flux predictions from different land surface models, at a broad range of flux tower sites using common evaluation metrics, was on average worse than relatively simple empirical models. For sensible heat fluxes, all land surface models were outperformed by a linear regression against downward shortwave radiation. For latent heat flux, all land surface models were outperformed by a regression against downward shortwave, surface air temperature and relative humidity. These results are explored here in greater detail and possible causes are investigated. We examine whether particular metrics or sites unduly influence the collated results, whether results change according to time-scale aggregation and whether a lack of energy conservation in fluxtower data gives the empirical models an unfair advantage in the intercomparison. We demonstrate that energy conservation in the observational data is not responsible for these results. We also show that the partitioning between sensible and latent heat fluxes in LSMs, rather than the calculation of available energy, is the cause of the original findings. Finally, we present evidence suggesting that the nature of this partitioning problem is likely shared among all contributing LSMs. While we do not find a single candidate explanation forwhy land surface models perform poorly relative to empirical benchmarks in PLUMBER, we do exclude multiple possible explanations and provide guidance on where future research should focus.
Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G
2011-06-28
We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Predicting Trihalomethanes (THMs) in the New York City Water Supply
NASA Astrophysics Data System (ADS)
Mukundan, R.; Van Dreason, R.
2013-12-01
Chlorine, a commonly used disinfectant in most water supply systems, can combine with organic carbon to form disinfectant byproducts including carcinogenic trihalomethanes (THMs). We used water quality data from 24 monitoring sites within the New York City (NYC) water supply distribution system, measured between January 2009 and April 2012, to develop site-specific empirical models for predicting total trihalomethane (TTHM) levels. Terms in the model included various combinations of the following water quality parameters: total organic carbon, pH, specific conductivity, and water temperature. Reasonable estimates of TTHM levels were achieved with overall R2 of about 0.87 and predicted values within 5 μg/L of measured values. The relative importance of factors affecting TTHM formation was estimated by ranking the model regression coefficients. Site-specific models showed improved model performance statistics compared to a single model for the entire system most likely because the single model did not consider locational differences in the water treatment process. Although never out of compliance in 2011, the TTHM levels in the water supply increased following tropical storms Irene and Lee with 45% of the samples exceeding the 80 μg/L Maximum Contaminant Level (MCL) in October and November. This increase was explained by changes in water quality parameters, particularly by the increase in total organic carbon concentration and pH during this period.
Stochastic search, optimization and regression with energy applications
NASA Astrophysics Data System (ADS)
Hannah, Lauren A.
Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.
Bignardi, A B; El Faro, L; Torres Júnior, R A A; Cardoso, V L; Machado, P F; Albuquerque, L G
2011-10-31
We analyzed 152,145 test-day records from 7317 first lactations of Holstein cows recorded from 1995 to 2003. Our objective was to model variations in test-day milk yield during the first lactation of Holstein cows by random regression model (RRM), using various functions in order to obtain adequate and parsimonious models for the estimation of genetic parameters. Test-day milk yields were grouped into weekly classes of days in milk, ranging from 1 to 44 weeks. The contemporary groups were defined as herd-test-day. The analyses were performed using a single-trait RRM, including the direct additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. The mean trend of milk yield was modeled with a fourth-order orthogonal Legendre polynomial. The additive genetic and permanent environmental covariance functions were estimated by random regression on two parametric functions, Ali and Schaeffer and Wilmink, and on B-spline functions of days in milk. The covariance components and the genetic parameters were estimated by the restricted maximum likelihood method. Results from RRM parametric and B-spline functions were compared to RRM on Legendre polynomials and with a multi-trait analysis, using the same data set. Heritability estimates presented similar trends during mid-lactation (13 to 31 weeks) and between week 37 and the end of lactation, for all RRM. Heritabilities obtained by multi-trait analysis were of a lower magnitude than those estimated by RRM. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. RRM using B-spline and Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data.
NASA Astrophysics Data System (ADS)
Mangla, Rohit; Kumar, Shashi; Nandy, Subrata
2016-05-01
SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
Single Parenthood and Children's Reading Performance in Asia
ERIC Educational Resources Information Center
Park, Hyunjoon
2007-01-01
Using the data from Program for International Student Assessment, I examine the gap in reading performance between 15-year-old students in single-parent and intact families in 5 Asian countries in comparison to the United States. The ordinary least square regression analyses show negligible disadvantages of students with a single parent in Hong…
Adherence in single-parent households in a long-term asthma clinical trial.
Spicher, Mary; Bollers, Nancy; Chinn, Tamara; Hall, Anita; Plunkett, Anne; Rodgers, Denise; Sundström, D A; Wilson, Laura
2012-01-01
Adherence of participants in a long-term clinical trial is necessary to assure validity of findings. This article examines adherence differences between single-parent and two-parent families in the Childhood Asthma Management Program (CAMP). Adherence was defined as the percentage of completed daily diary cards and scheduled study visits during the course of the trial. Logistic regression and ordinal logistic regression analyses were used. Children from single-parent families had a lower percentage of completed diary cards (72% vs. 84%) than two-parent families. Single-parent families were also more likely to reschedule visits (62% vs. 45%) and miss more clinic visits (23% vs. 17%) than two-parent families. Suggestions are given for study coordinators to assist participants in completing a long-term clinical trial. Many suggestions may be adapted for nurses in inpatient or outpatient settings for assisting parents of patients with chronic diseases.
Wu, Xue; Sengupta, Kaushik
2018-03-19
This paper demonstrates a methodology to miniaturize THz spectroscopes into a single silicon chip by eliminating traditional solid-state architectural components such as complex tunable THz and optical sources, nonlinear mixing and amplifiers. The proposed method achieves this by extracting incident THz spectral signatures from the surface of an on-chip antenna itself. The information is sensed through the spectrally-sensitive 2D distribution of the impressed current surface under the THz incident field. By converting the antenna from a single-port to a massively multi-port architecture with integrated electronics and deep subwavelength sensing, THz spectral estimation is converted into a linear estimation problem. We employ rigorous regression techniques and analysis to demonstrate a single silicon chip system operating at room temperature across 0.04-0.99 THz with 10 MHz accuracy in spectrum estimation of THz tones across the entire spectrum.
Reitemeier, Bernd; Hänsel, Kristina; Kastner, Christian; Weber, Anke; Walter, Michael H
2013-03-01
Metal ceramic restorations are widely used in prosthodontics, but long-term data on their clinical performance in private practice settings based on prospective trials are sparse. This clinical trial was designed to provide realistic long-term survival rates for different outcomes related to tooth loss, crown loss, and metal ceramic defect. Ninety-five participants were provided with 190 noble metal ceramic single crowns and 138 participants with 276 fixed dental prosthesis retainer crowns on vital posterior teeth. Follow-up examinations were scheduled 2 weeks after insertion, annually up to 8 years, and after 10 years. Kaplan-Meier survival analyses, Mantel-Cox logrank tests, and Cox regression analyses were conducted. Because of variations in the time of the last examinations, the maximum observation period was 12.1 years. For the primary outcome 'loss of crown or tooth', the Kaplan-Meier survival rate was 94.3% ±1.8% (standard error) at 8.0 years (last outcome event) for single crowns and 94.4% ±1.5% at 11.0 years for fixed dental prosthesis retainer crowns. The difference between the survival functions was not significant (P>.05). For the secondary outcome 'metal ceramic defect', the survival rate was 88.8% ±3.2% at 11.0 years for single crowns and 81.7% ±3.5% at 11.0 years for fixed dental prosthesis retainer crowns. In Cox regression models, the only significant covariates for the outcome event 'metal ceramic defect' were bruxism in the medical history (single crowns) and signs and symptoms of bruxism (fixed dental prosthesis retainer crowns) with hazard ratios of 3.065 (95% CI 1.063 - 8.832) and 2.554 (95% CI 1.307 - 4.992). Metal ceramic crowns provided in private practice settings show good longevity. Bruxism appears to indicate a risk for metal ceramic defects. Copyright © 2013 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neeway, James J.; Rieke, Peter C.; Parruzot, Benjamin P.
In far-from-equilibrium conditions, the dissolution of borosilicate glasses used to immobilize nuclear waste is known to be a function of both temperature and pH. The aim of this paper is to study effects of these variables on three model waste glasses (SON68, ISG, AFCI). To do this, experiments were conducted at temperatures of 23, 40, 70, and 90 °C and pH(RT) values of 9, 10, 11, and 12 with the single-pass flow-through (SPFT) test method. The results from these tests were then used to parameterize a kinetic rate model based on transition state theory. Both the absolute dissolution rates andmore » the rate model parameters are compared with previous results. Discrepancies in the absolute dissolution rates as compared to those obtained using other test methods are discussed. Rate model parameters for the three glasses studied here are nearly equivalent within error and in relative agreement with previous studies. The results were analyzed with a linear multivariate regression (LMR) and a nonlinear multivariate regression performed with the use of the Glass Corrosion Modeling Tool (GCMT), which is capable of providing a robust uncertainty analysis. This robust analysis highlights the high degree of correlation of various parameters in the kinetic rate model. As more data are obtained on borosilicate glasses with varying compositions, the effect of glass composition on the rate parameter values could possibly be obtained. This would allow for the possibility of predicting the forward dissolution rate of glass based solely on composition« less
Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R
2012-01-01
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases.
Medication adherence among patients in a chronic disease clinic.
Tourkmani, Ayla M; Al Khashan, Hisham I; Albabtain, Monirah A; Al Harbi, Turki J; Al Qahatani, Hala B; Bakhiet, Ahmed H
2012-12-01
To assess motivation and knowledge domains of medication adherence intention, and to determine their predictors in an ambulatory setting. We conducted a cross-sectional survey study among patients attending a chronic disease clinic at the Family and Community Medicine Department, Prince Sultan Military Medical City, Riyadh, Kingdom of Saudi Arabia between June and September 2010. Adherence intention was assessed using Modified Morisky Scale. Predictors of low motivation and/or knowledge were determined using logistic regression models. A total of 347 patients were interviewed during the study duration. Most patients (75.5%) had 2 or more chronic diseases with an average of 6.3 +/- 2.3 medications, and 6.5 +/- 2.9 pills per prescription. The frequency of adherence intention was low (4.6%), variable (37.2%), and high (58.2%). In multivariate logistic regression analysis, younger age and having asthma were significantly associated with low motivation, while male gender, single status, and not having hypertension were significantly associated with low knowledge. Single status was the only independent predictor of low adherence intention. In a population with multiple chronic diseases and high illiteracy rate, more than 40% had low/variable intention to adhere to prescribed medications. Identifying predictors of this group may help in providing group-specific interventional programs.
Investigation of Genetic Variants Associated with Alzheimer Disease in Parkinson Disease Cognition.
Barrett, Matthew J; Koeppel, Alexander F; Flanigan, Joseph L; Turner, Stephen D; Worrall, Bradford B
2016-01-01
Meta-analysis of genome-wide association studies have implicated multiple single nucleotide polymorphisms (SNPs) and associated genes with Alzheimer disease. The role of these SNPs in cognitive impairment in Parkinson disease (PD) remains incompletely evaluated. The objective of this study was to test alleles associated with risk of Alzheimer disease for association with cognitive impairment in Parkinson disease (PD). Two datasets with PD subjects accessed through the NIH database of Genotypes and Phenotypes contained both single nucleotide polymorphism (SNP) arrays and mini-mental state exam (MMSE) scores. Genetic data underwent rigorous quality control and we selected SNPs for genes associated with AD other than APOE. We constructed logistic regression and ordinal regression models, adjusted for sex, age at MMSE, and duration of PD, to assess the association between selected SNPs and MMSE score. In one dataset, PICALM rs3851179 was associated with cognitive impairment (MMSE < 24) in PD subjects > 70 years old (OR = 2.3; adjusted p-value = 0.017; n = 250) but not in PD subjects ≤ 70 years old. Our finding suggests that PICALM rs3851179 could contribute to cognitive impairment in older patients with PD. It is important that future studies consider the interaction of age and genetic risk factors in the development of cognitive impairment in PD.
Limited sampling strategy for determining metformin area under the plasma concentration–time curve
Santoro, Ana Beatriz; Stage, Tore Bjerregaard; Struchiner, Claudio José; Christensen, Mette Marie Hougaard; Brosen, Kim
2016-01-01
Aim The aim was to develop and validate limited sampling strategy (LSS) models to predict the area under the plasma concentration–time curve (AUC) for metformin. Methods Metformin plasma concentrations (n = 627) at 0–24 h after a single 500 mg dose were used for LSS development, based on all subsets linear regression analysis. The LSS‐derived AUC(0,24 h) was compared with the parameter ‘best estimate’ obtained by non‐compartmental analysis using all plasma concentration data points. Correlation between the LSS‐derived and the best estimated AUC(0,24 h) (r 2), bias and precision of the LSS estimates were quantified. The LSS models were validated in independent cohorts. Results A two‐point (3 h and 10 h) regression equation with no intercept estimated accurately the individual AUC(0,24 h) in the development cohort: r 2 = 0.927, bias (mean, 95% CI) –0.5, −2.7–1.8% and precision 6.3, 4.9–7.7%. The accuracy of the two point LSS model was verified in study cohorts of individuals receiving single 500 or 1000 mg (r 2 = –0.933–0.934) or seven 1000 mg daily doses (r 2 = 0.918), as well as using data from 16 published studies covering a wide range of metformin doses, demographics, clinical and experimental conditions (r 2 = 0.976). The LSS model reproduced previously reported results for effects of polymorphisms in OCT2 and MATE1 genes on AUC(0,24 h) and renal clearance of metformin. Conclusions The two point LSS algorithm may be used to assess the systemic exposure to metformin under diverse conditions, with reduced costs of sampling and analysis, and saving time for both subjects and investigators. PMID:27324407
Valdés-Stauber, Juan; Lemaczyk, Rafael; Kilian, Reinhold
2018-06-01
ABSTRACTObjective:Our aim was to identify possible patterns of change or durability in sources of meaning for family caregivers of terminally ill patients after the onset of support at home by an outreach palliative nursing team during a three-month survey period. The Sources of Meaning and Meaning in Life Questionnaire (SoMe) was administered to 100 caregivers of terminally ill patients at four measurement timepoints: immediately before the onset of the palliative care (t0), and at 1 week, 1 month, and 3 months after t0. Time-dependent changes were assessed for the completed subsample (n = 24) by means of bivariate linear as well as quadratic regression models. Multivariate regressions with dimensions of meaning in life as dependent variables were performed for the whole sample by means of random-effects models: dependent variables changed over time (four timepoints), whereas regressors remained constant. No significant differences were found for psychosocial and clinical variables or for sources of meaning between the uncompleted and completed subsamples. Growth curve analyses revealed no statistically significant but tendentiously parabolic changes for any dimensions or for single sources of meaning. In multivariate models, a negative association was found between patient age, psychological burden of family caregivers, and changes in total SoMe score, as well as for the superordinate dimensions. According to our hypothesis, sources of meaning and meaning in life seem to remain robust in relatives caring for terminally ill family members during the three-month survey period. A parabolic development pattern of single sources of meaning indicates an adjustment process. An important limitation of our study is the small number of participants compared with larger multivariate models because of high dropout rates, primarily due to the death of three-quarters of the participants during the survey period.
Modeling absolute differences in life expectancy with a censored skew-normal regression approach
Clough-Gorr, Kerri; Zwahlen, Marcel
2015-01-01
Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
Hanmer, Janel; Cherepanov, Dasha
2016-09-01
To evaluate a general question about ability to meet monthly bills as an alternative to direct questions about income and assets in health utility studies. We used data from the National Health Measurement Study-a US nationally representative telephone survey collected in 2005-2006. It included health utility measures (EuroQol-5D-3L, Health Utilities Index Mark 3, Short Form-6D, and Quality of Well-being Index) and household income, assets, and financial ability to meet monthly bills questions. Each utility score was regressed on: income and assets (Model 1); difficulty paying bills (DPB) (Model 2); income, assets, and DPB (Model 3). All models used survey weights and adjusted for demographics and education. Among 3666 respondents, as income and assets increased, DPB decreased. The DPB question had fewer missing values (n = 30) than income (n = 311) or assets (n = 373). Model 2 (DPB only) explained more variance in health utility than Model 1 (income and assets only). Including all measures (Model 3) had very modest improvement in R (2), e.g., values were 0.112 (Model 1), 0.166 (Model 2), and 0.175 (Model 3) for EuroQol-5D-3L. The single question on DPB yields more information and has less missing values than the traditionally used income and assets questions.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)
NASA Astrophysics Data System (ADS)
Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul
2018-05-01
The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
Lee, Eunyoung; Cumberbatch, Jewel; Wang, Meng; Zhang, Qiong
2017-03-01
Anaerobic co-digestion has a potential to improve biogas production, but limited kinetic information is available for co-digestion. This study introduced regression-based models to estimate the kinetic parameters for the co-digestion of microalgae and Waste Activated Sludge (WAS). The models were developed using the ratios of co-substrates and the kinetic parameters for the single substrate as indicators. The models were applied to the modified first-order kinetics and Monod model to determine the rate of hydrolysis and methanogenesis for the co-digestion. The results showed that the model using a hyperbola function was better for the estimation of the first-order kinetic coefficients, while the model using inverse tangent function closely estimated the Monod kinetic parameters. The models can be used for estimating kinetic parameters for not only microalgae-WAS co-digestion but also other substrates' co-digestion such as microalgae-swine manure and WAS-aquatic plants. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bayesian Unimodal Density Regression for Causal Inference
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2011-01-01
Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace
ERIC Educational Resources Information Center
Culpepper, Steven Andrew; Park, Trevor
2017-01-01
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
Morrell, Glen R.; Ikizler, Talat A.; Chen, Xiaorui; Heilbrun, Marta E.; Wei, Guo; Boucher, Robert; Beddhu, Srinivasan
2016-01-01
Objective We investigate whether psoas or paraspinous muscle area measured on a single L4–5 image is a useful measure of whole lean body mass compared to dedicated mid-thigh magnetic resonance imaging (MRI). Design Observational study. Setting Outpatient dialysis units and a research clinic. Subjects 105 adult participants on maintenance hemodialysis. No control group was used. Exposure variables Psoas muscle area, paraspinous muscle area, and mid-thigh muscle area (MTMA) were measured by MRI. Main outcome measure Lean body mass was measured by dual-energy absorptiometry (DEXA) scan. Results In separate multivariable linear regression models, psoas, paraspinous, and mid-thigh muscle area were associated with increase in lean body mass. In separate multivariate logistic regression models, c-statistics for diagnosis of sarcopenia (defined as < 25th percentile of lean body mass) were 0.69 for paraspinous muscle area, 0.81 for psoas muscle area, and 0.89 for mid-thigh muscle area. With sarcopenia defined as < 10th percentile of lean body mass, the corresponding c-statistics were 0.71, 0.92, and 0.94. Conclusions We conclude that psoas muscle area provides a good measure of whole body muscle mass, better than paraspinous muscle area but slightly inferior to mid thigh measurement. Hence, in body composition studies a single axial MR image at the L4–L5 level can be used to provide information on both fat and muscle and may eliminate the need for time-consuming measurement of muscle area in the thigh. PMID:26994780
Comparative evaluation of urban storm water quality models
NASA Astrophysics Data System (ADS)
Vaze, J.; Chiew, Francis H. S.
2003-10-01
The estimation of urban storm water pollutant loads is required for the development of mitigation and management strategies to minimize impacts to receiving environments. Event pollutant loads are typically estimated using either regression equations or "process-based" water quality models. The relative merit of using regression models compared to process-based models is not clear. A modeling study is carried out here to evaluate the comparative ability of the regression equations and process-based water quality models to estimate event diffuse pollutant loads from impervious surfaces. The results indicate that, once calibrated, both the regression equations and the process-based model can estimate event pollutant loads satisfactorily. In fact, the loads estimated using the regression equation as a function of rainfall intensity and runoff rate are better than the loads estimated using the process-based model. Therefore, if only estimates of event loads are required, regression models should be used because they are simpler and require less data compared to process-based models.
Hoos, Anne B.; Moore, Richard B.; Garcia, Ana Maria; Noe, Gregory B.; Terziotti, Silvia E.; Johnston, Craig M.; Dennis, Robin L.
2013-01-01
Existing Spatially Referenced Regression on Watershed attributes (SPARROW) nutrient models for the northeastern and southeastern regions of the United States were recalibrated to achieve a hydrographically consistent model with which to assess nutrient sources and stream transport and investigate specific management questions about the effects of wetlands and atmospheric deposition on nutrient transport. Recalibrated nitrogen models for the northeast and southeast were sufficiently similar to be merged into a single nitrogen model for the eastern United States. The atmospheric deposition source in the nitrogen model has been improved to account for individual components of atmospheric input, derived from emissions from agricultural manure, agricultural livestock, vehicles, power plants, other industry, and background sources. This accounting makes it possible to simulate the effects of altering an individual component of atmospheric deposition, such as nitrate emissions from vehicles or power plants. Regional differences in transport of phosphorus through wetlands and reservoirs were investigated and resulted in two distinct phosphorus models for the northeast and southeast. The recalibrated nitrogen and phosphorus models account explicitly for the influence of wetlands on regional-scale land-phase and aqueous-phase transport of nutrients and therefore allow comparison of the water-quality functions of different wetland systems over large spatial scales. Seven wetland systems were associated with enhanced transport of either nitrogen or phosphorus in streams, probably because of the export of dissolved organic nitrogen and bank erosion. Six wetland systems were associated with mitigating the delivery of either nitrogen or phosphorus to streams, probably because of sedimentation, phosphate sorption, and ground water infiltration.
2014-01-01
Background This study aims to suggest an approach that integrates multilevel models and eigenvector spatial filtering methods and apply it to a case study of self-rated health status in South Korea. In many previous health-related studies, multilevel models and single-level spatial regression are used separately. However, the two methods should be used in conjunction because the objectives of both approaches are important in health-related analyses. The multilevel model enables the simultaneous analysis of both individual and neighborhood factors influencing health outcomes. However, the results of conventional multilevel models are potentially misleading when spatial dependency across neighborhoods exists. Spatial dependency in health-related data indicates that health outcomes in nearby neighborhoods are more similar to each other than those in distant neighborhoods. Spatial regression models can address this problem by modeling spatial dependency. This study explores the possibility of integrating a multilevel model and eigenvector spatial filtering, an advanced spatial regression for addressing spatial dependency in datasets. Methods In this spatially filtered multilevel model, eigenvectors function as additional explanatory variables accounting for unexplained spatial dependency within the neighborhood-level error. The specification addresses the inability of conventional multilevel models to account for spatial dependency, and thereby, generates more robust outputs. Results The findings show that sex, employment status, monthly household income, and perceived levels of stress are significantly associated with self-rated health status. Residents living in neighborhoods with low deprivation and a high doctor-to-resident ratio tend to report higher health status. The spatially filtered multilevel model provides unbiased estimations and improves the explanatory power of the model compared to conventional multilevel models although there are no changes in the signs of parameters and the significance levels between the two models in this case study. Conclusions The integrated approach proposed in this paper is a useful tool for understanding the geographical distribution of self-rated health status within a multilevel framework. In future research, it would be useful to apply the spatially filtered multilevel model to other datasets in order to clarify the differences between the two models. It is anticipated that this integrated method will also out-perform conventional models when it is used in other contexts. PMID:24571639
Regression of a vaginal leiomyoma after ovariohysterectomy in a dog: a case report.
Sathya, Suresh; Linn, Kathleen
2014-01-01
An 11 yr old female mixed-breed Siberian husky was presented with a history of sanguineous vaginal discharge, swelling of the perineal area, decreased appetite, and lethargy. A single, large vaginal leiomyoma and multiple mammary tumors were diagnosed. Mastectomy and ovariohysterectomy were performed. The vaginal leiomyoma regressed completely after ovariohysterectomy. This is the first reported case of spontaneous regression of a vaginal leiomyoma after ovariohysterectomy in a dog.
Wang, Shaomeng; Sun, Wei; Zhao, Yujun; ...
2014-08-21
Blocking the MDM2-p53 protein-protein interaction has long been considered to offer a broad cancer therapeutic strategy, despite the potential risks of selecting tumors harboring p53 mutations that escape MDM2 control. In this study, we report a novel small molecule inhibitor of the MDM2-p53 interaction, SAR405838 (MI-77301) that has been advanced into Phase I clinical trials. SAR405838 binds to MDM2 with K i = 0.88 nM and has high specificity over other proteins. A co-crystal structure of the SAR405838:MDM2 complex shows that in addition to mimicking three key p53 amino acid residues, the inhibitor captures additional interactions not observed in themore » p53-MDM2 complex and induces refolding of the short, unstructured MDM2 N-terminal region to achieve its high affinity. SAR405838 effectively activates wild-type p53 in vitro and in xenograft tumor tissue of leukemia and solid tumors, leading to p53-dependent cell cycle arrest and/or apoptosis. At well-tolerated dose schedules, SAR405838 achieves either durable tumor regression or complete tumor growth inhibition in mouse xenograft models of SJSA-1 osteosarcoma, RS4;11 acute leukemia, LNCaP prostate cancer and HCT-116 colon cancer. Remarkably, a single oral dose of SAR405838 is sufficient to achieve complete tumor regression in the SJSA-1 model. Mechanistically, robust transcriptional up-regulation of PUMA induced by SAR405838 results in strong apoptosis in tumor tissue, leading to complete tumor regression. Lastly, our findings provide a preclinical basis upon which to evaluate SAR405838 as a therapeutic agent in patients whose tumors retain wild-type p53.« less
Somma, Francesco; Cammarota, Giuseppe; Plotino, Gianluca; Grande, Nicola M; Pameijer, Cornelis H
2008-04-01
The aim of this study was to compare the effectiveness of the Mtwo R (Sweden & Martina, Padova, Italy), ProTaper retreatment files (Dentsply-Maillefer, Ballaigues, Switzerland), and a Hedström manual technique in the removal of three different filling materials (gutta-percha, Resilon [Resilon Research LLC, Madison, CT], and EndoRez [Ultradent Products Inc, South Jordan, UT]) during retreatment. Ninety single-rooted straight premolars were instrumented and randomly divided into 9 groups of 10 teeth each (n = 10) with regards to filling material and instrument used. For all roots, the following data were recorded: procedural errors, time of retreatment, apically extruded material, canal wall cleanliness through optical stereomicroscopy (OSM), and scanning electron microscopy (SEM). A linear regression analysis and three logistic regression analyses were performed to assess the level of significance set at p = 0.05. The results indicated that the overall regression models were statistically significant. The Mtwo R, ProTaper retreatment files, and Resilon filling material had a positive impact in reducing the time for retreatment. Both ProTaper retreatment files and Mtwo R showed a greater extrusion of debris. For both OSM and SEM logistic regression models, the root canal apical third had the greatest impact on the score values. EndoRez filling material resulted in cleaner root canal walls using OSM analysis, whereas Resilon filling material and both engine-driven NiTi rotary techniques resulted in less clean root canal walls according to SEM analysis. In conclusion, all instruments left remnants of filling material and debris on the root canal walls irrespective of the root filling material used. Both the engine-driven NiTi rotary systems proved to be safe and fast devices for the removal of endodontic filling material.
A generalized right truncated bivariate Poisson regression model with applications to health data.
Islam, M Ataharul; Chowdhury, Rafiqul I
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.
A generalized right truncated bivariate Poisson regression model with applications to health data
Islam, M. Ataharul; Chowdhury, Rafiqul I.
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Pomann, Gina-Maria; Sweeney, Elizabeth M; Reich, Daniel S; Staicu, Ana-Maria; Shinohara, Russell T
2015-09-10
Multiple sclerosis (MS) is an immune-mediated neurological disease that causes morbidity and disability. In patients with MS, the accumulation of lesions in the white matter of the brain is associated with disease progression and worse clinical outcomes. Breakdown of the blood-brain barrier in newer lesions is indicative of more active disease-related processes and is a primary outcome considered in clinical trials of treatments for MS. Such abnormalities in active MS lesions are evaluated in vivo using contrast-enhanced structural MRI, during which patients receive an intravenous infusion of a costly magnetic contrast agent. In some instances, the contrast agents can have toxic effects. Recently, local image regression techniques have been shown to have modest performance for assessing the integrity of the blood-brain barrier based on imaging without contrast agents. These models have centered on the problem of cross-sectional classification in which patients are imaged at a single study visit and pre-contrast images are used to predict post-contrast imaging. In this paper, we extend these methods to incorporate historical imaging information, and we find the proposed model to exhibit improved performance. We further develop scan-stratified case-control sampling techniques that reduce the computational burden of local image regression models, while respecting the low proportion of the brain that exhibits abnormal vascular permeability. Copyright © 2015 John Wiley & Sons, Ltd.
Multivariate Models for Prediction of Human Skin Sensitization ...
One of the lnteragency Coordinating Committee on the Validation of Alternative Method's (ICCVAM) top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary to produce skin sensitization suggests that no single alternative method will replace the currently accepted animal tests. ICCVAM is evaluating an integrated approach to testing and assessment based on the adverse outcome pathway for skin sensitization that uses machine learning approaches to predict human skin sensitization hazard. We combined data from three in chemico or in vitro assays - the direct peptide reactivity assay (DPRA), human cell line activation test (h-CLAT) and KeratinoSens TM assay - six physicochemical properties and an in silico read-across prediction of skin sensitization hazard into 12 variable groups. The variable groups were evaluated using two machine learning approaches , logistic regression and support vector machine, to predict human skin sensitization hazard. Models were trained on 72 substances and tested on an external set of 24 substances. The six models (three logistic regression and three support vector machine) with the highest accuracy (92%) used: (1) DPRA, h-CLAT and read-across; (2) DPRA, h-CLAT, read-across and KeratinoSens; or (3) DPRA, h-CLAT, read-across, KeratinoSens and log P. The models performed better at predicting human skin sensitization hazard than the murine
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?
NASA Astrophysics Data System (ADS)
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
2016-12-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
Hoover, Stephen; Jackson, Eric V.; Paul, David; Locke, Robert
2016-01-01
Summary Background Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Objective Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. Methods We used five years of retrospective daily NICU census data for model development (January 2008 – December 2012, N=1827 observations) and one year of data for validation (January – December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. Results The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Conclusions Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning. PMID:27437040
Capan, Muge; Hoover, Stephen; Jackson, Eric V; Paul, David; Locke, Robert
2016-01-01
Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. We used five years of retrospective daily NICU census data for model development (January 2008 - December 2012, N=1827 observations) and one year of data for validation (January - December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning.