Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
Mohammed, Mohammed A; Manktelow, Bradley N; Hofer, Timothy P
2016-04-01
There is interest in deriving case-mix adjusted standardised mortality ratios so that comparisons between healthcare providers, such as hospitals, can be undertaken in the controversial belief that variability in standardised mortality ratios reflects quality of care. Typically standardised mortality ratios are derived using a fixed effects logistic regression model, without a hospital term in the model. This fails to account for the hierarchical structure of the data - patients nested within hospitals - and so a hierarchical logistic regression model is more appropriate. However, four methods have been advocated for deriving standardised mortality ratios from a hierarchical logistic regression model, but their agreement is not known and neither do we know which is to be preferred. We found significant differences between the four types of standardised mortality ratios because they reflect a range of underlying conceptual issues. The most subtle issue is the distinction between asking how an average patient fares in different hospitals versus how patients at a given hospital fare at an average hospital. Since the answers to these questions are not the same and since the choice between these two approaches is not obvious, the extent to which profiling hospitals on mortality can be undertaken safely and reliably, without resolving these methodological issues, remains questionable. © The Author(s) 2012.
NASA Astrophysics Data System (ADS)
Zhai, Mengting; Chen, Yan; Li, Jing; Zhou, Jun
2017-12-01
The molecular electrongativity distance vector (MEDV-13) was used to describe the molecular structure of benzyl ether diamidine derivatives in this paper, Based on MEDV-13, The three-parameter (M 3, M 15, M 47) QSAR model of insecticidal activity (pIC 50) for 60 benzyl ether diamidine derivatives was constructed by leaps-and-bounds regression (LBR) . The traditional correlation coefficient (R) and the cross-validation correlation coefficient (R CV ) were 0.975 and 0.971, respectively. The robustness of the regression model was validated by Jackknife method, the correlation coefficient R were between 0.971 and 0.983. Meanwhile, the independent variables in the model were tested to be no autocorrelation. The regression results indicate that the model has good robust and predictive capabilities. The research would provide theoretical guidance for the development of new generation of anti African trypanosomiasis drugs with efficiency and low toxicity.
QSAR Analysis of 2-Amino or 2-Methyl-1-Substituted Benzimidazoles Against Pseudomonas aeruginosa
Podunavac-Kuzmanović, Sanja O.; Cvetković, Dragoljub D.; Barna, Dijana J.
2009-01-01
A set of benzimidazole derivatives were tested for their inhibitory activities against the Gram-negative bacterium Pseudomonas aeruginosa and minimum inhibitory concentrations were determined for all the compounds. Quantitative structure activity relationship (QSAR) analysis was applied to fourteen of the abovementioned derivatives using a combination of various physicochemical, steric, electronic, and structural molecular descriptors. A multiple linear regression (MLR) procedure was used to model the relationships between molecular descriptors and the antibacterial activity of the benzimidazole derivatives. The stepwise regression method was used to derive the most significant models as a calibration model for predicting the inhibitory activity of this class of molecules. The best QSAR models were further validated by a leave one out technique as well as by the calculation of statistical parameters for the established theoretical models. To confirm the predictive power of the models, an external set of molecules was used. High agreement between experimental and predicted inhibitory values, obtained in the validation procedure, indicated the good quality of the derived QSAR models. PMID:19468332
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L
2017-01-01
Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Role of nitric oxide in progression and regression of atherosclerosis.
Cooke, J P
1996-01-01
Endothelium-derived nitric oxide is a potent endogenous vasodilator that is derived from the metabolism of L-arginine. This endothelial factor inhibits circulating blood elements from interacting with the vessel wall. Platelet adherence and aggregation as well as monocyte adherence and infiltration are opposed by this paracrine substance. By virtue of these characteristics, endothelium-derived nitric oxide inhibits atherogenesis in animal models and may even induce regression. Images Figure 1. PMID:8686299
Wu, Lingtao; Lord, Dominique
2017-05-01
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ding, A Adam; Wu, Hulin
2014-10-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.
Ding, A. Adam; Wu, Hulin
2015-01-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method. PMID:26401093
Liu, Xun; Li, Ning-shan; Lv, Lin-sheng; Huang, Jian-hua; Tang, Hua; Chen, Jin-xia; Ma, Hui-juan; Wu, Xiao-ming; Lou, Tan-qi
2013-12-01
Accurate estimation of glomerular filtration rate (GFR) is important in clinical practice. Current models derived from regression are limited by the imprecision of GFR estimates. We hypothesized that an artificial neural network (ANN) might improve the precision of GFR estimates. A study of diagnostic test accuracy. 1,230 patients with chronic kidney disease were enrolled, including the development cohort (n=581), internal validation cohort (n=278), and external validation cohort (n=371). Estimated GFR (eGFR) using a new ANN model and a new regression model using age, sex, and standardized serum creatinine level derived in the development and internal validation cohort, and the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) 2009 creatinine equation. Measured GFR (mGFR). GFR was measured using a diethylenetriaminepentaacetic acid renal dynamic imaging method. Serum creatinine was measured with an enzymatic method traceable to isotope-dilution mass spectrometry. In the external validation cohort, mean mGFR was 49±27 (SD) mL/min/1.73 m2 and biases (median difference between mGFR and eGFR) for the CKD-EPI, new regression, and new ANN models were 0.4, 1.5, and -0.5 mL/min/1.73 m2, respectively (P<0.001 and P=0.02 compared to CKD-EPI and P<0.001 comparing the new regression and ANN models). Precisions (IQRs for the difference) were 22.6, 14.9, and 15.6 mL/min/1.73 m2, respectively (P<0.001 for both compared to CKD-EPI and P<0.001 comparing the new ANN and new regression models). Accuracies (proportions of eGFRs not deviating >30% from mGFR) were 50.9%, 77.4%, and 78.7%, respectively (P<0.001 for both compared to CKD-EPI and P=0.5 comparing the new ANN and new regression models). Different methods for measuring GFR were a source of systematic bias in comparisons of new models to CKD-EPI, and both the derivation and validation cohorts consisted of a group of patients who were referred to the same institution. An ANN model using 3 variables did not perform better than a new regression model. Whether ANN can improve GFR estimation using more variables requires further investigation. Copyright © 2013 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Weighted regression analysis and interval estimators
Donald W. Seegrist
1974-01-01
A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.
Development and evaluation of habitat models for herpetofauna and small mammals
William M. Block; Michael L. Morrison; Peter E. Scott
1998-01-01
We evaluated the ability of discriminant analysis (DA), logistic regression (LR), and multiple regression (MR) to describe habitat use by amphibians, reptiles, and small mammals found in California oak woodlands. We also compared models derived from pitfall and live trapping data for several species. Habitat relations modeled by DA and LR produced similar results,...
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
USE OF LETHALITY DATA DURING CATEGORICAL REGRESSION MODELING OF ACUTE REFERENCE EXPOSURES
Categorical regression is being considered by the U.S. EPA as an additional tool for derivation of acute reference exposures (AREs) to be used for human health risk assessment for exposure to inhaled chemicals. Categorical regression is used to calculate probability-response fun...
Influence diagnostics in meta-regression model.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
2017-09-01
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Regional regression models of watershed suspended-sediment discharge for the eastern United States
NASA Astrophysics Data System (ADS)
Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.
2012-11-01
SummaryEstimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1-8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.
Regional regression models of watershed suspended-sediment discharge for the eastern United States
Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.
2012-01-01
Estimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1–8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.
Mauer, Michael; Caramori, Maria Luiza; Fioretto, Paola; Najafian, Behzad
2015-06-01
Studies of structural-functional relationships have improved understanding of the natural history of diabetic nephropathy (DN). However, in order to consider structural end points for clinical trials, the robustness of the resultant models needs to be verified. This study examined whether structural-functional relationship models derived from a large cohort of type 1 diabetic (T1D) patients with a wide range of renal function are robust. The predictability of models derived from multiple regression analysis and piecewise linear regression analysis was also compared. T1D patients (n = 161) with research renal biopsies were divided into two equal groups matched for albumin excretion rate (AER). Models to explain AER and glomerular filtration rate (GFR) by classical DN lesions in one group (T1D-model, or T1D-M) were applied to the other group (T1D-test, or T1D-T) and regression analyses were performed. T1D-M-derived models explained 70 and 63% of AER variance and 32 and 21% of GFR variance in T1D-M and T1D-T, respectively, supporting the substantial robustness of the models. Piecewise linear regression analyses substantially improved predictability of the models with 83% of AER variance and 66% of GFR variance explained by classical DN glomerular lesions alone. These studies demonstrate that DN structural-functional relationship models are robust, and if appropriate models are used, glomerular lesions alone explain a major proportion of AER and GFR variance in T1D patients. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
An INAR(1) Negative Multinomial Regression Model for Longitudinal Count Data.
ERIC Educational Resources Information Center
Bockenholt, Ulf
1999-01-01
Discusses a regression model for the analysis of longitudinal count data in a panel study by adapting an integer-valued first-order autoregressive (INAR(1)) Poisson process to represent time-dependent correlation between counts. Derives a new negative multinomial distribution by combining INAR(1) representation with a random effects approach.…
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
A regression-based 3-D shoulder rhythm.
Xu, Xu; Lin, Jia-hua; McGorry, Raymond W
2014-03-21
In biomechanical modeling of the shoulder, it is important to know the orientation of each bone in the shoulder girdle when estimating the loads on each musculoskeletal element. However, because of the soft tissue overlying the bones, it is difficult to accurately derive the orientation of the clavicle and scapula using surface markers during dynamic movement. The purpose of this study is to develop two regression models which predict the orientation of the clavicle and the scapula. The first regression model uses humerus orientation and individual factors such as age, gender, and anthropometry data as the predictors. The second regression model includes only the humerus orientation as the predictor. Thirty-eight participants performed 118 static postures covering the volume of the right hand reach. The orientation of the thorax, clavicle, scapula and humerus were measured with a motion tracking system. Regression analysis was performed on the Euler angles decomposed from the orientation of each bone from 26 randomly selected participants. The regression models were then validated with the remaining 12 participants. The results indicate that for the first model, the r(2) of the predicted orientation of the clavicle and the scapula ranged between 0.31 and 0.65, and the RMSE obtained from the validation dataset ranged from 6.92° to 10.39°. For the second model, the r(2) ranged between 0.19 and 0.57, and the RMSE obtained from the validation dataset ranged from 6.62° and 11.13°. The derived regression-based shoulder rhythm could be useful in future biomechanical modeling of the shoulder. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
Estimating parasitic sea lamprey abundance in Lake Huron from heterogenous data sources
Young, Robert J.; Jones, Michael L.; Bence, James R.; McDonald, Rodney B.; Mullett, Katherine M.; Bergstedt, Roger A.
2003-01-01
The Great Lakes Fishery Commission uses time series of transformer, parasitic, and spawning population estimates to evaluate the effectiveness of its sea lamprey (Petromyzon marinus) control program. This study used an inverse variance weighting method to integrate Lake Huron sea lamprey population estimates derived from two estimation procedures: 1) prediction of the lake-wide spawning population from a regression model based on stream size and, 2) whole-lake mark and recapture estimates. In addition, we used a re-sampling procedure to evaluate the effect of trading off sampling effort between the regression and mark-recapture models. Population estimates derived from the regression model ranged from 132,000 to 377,000 while mark-recapture estimates of marked recently metamorphosed juveniles and parasitic sea lampreys ranged from 536,000 to 634,000 and 484,000 to 1,608,000, respectively. The precision of the estimates varied greatly among estimation procedures and years. The integrated estimate of the mark-recapture and spawner regression procedures ranged from 252,000 to 702,000 transformers. The re-sampling procedure indicated that the regression model is more sensitive to reduction in sampling effort than the mark-recapture model. Reliance on either the regression or mark-recapture model alone could produce misleading estimates of abundance of sea lampreys and the effect of the control program on sea lamprey abundance. These analyses indicate that the precision of the lakewide population estimate can be maximized by re-allocating sampling effort from marking sea lampreys to trapping additional streams.
Geodesic regression on orientation distribution functions with its application to an aging study.
Du, Jia; Goh, Alvina; Kushnarev, Sergey; Qiu, Anqi
2014-02-15
In this paper, we treat orientation distribution functions (ODFs) derived from high angular resolution diffusion imaging (HARDI) as elements of a Riemannian manifold and present a method for geodesic regression on this manifold. In order to find the optimal regression model, we pose this as a least-squares problem involving the sum-of-squared geodesic distances between observed ODFs and their model fitted data. We derive the appropriate gradient terms and employ gradient descent to find the minimizer of this least-squares optimization problem. In addition, we show how to perform statistical testing for determining the significance of the relationship between the manifold-valued regressors and the real-valued regressands. Experiments on both synthetic and real human data are presented. In particular, we examine aging effects on HARDI via geodesic regression of ODFs in normal adults aged 22 years old and above. © 2013 Elsevier Inc. All rights reserved.
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Modeling absolute differences in life expectancy with a censored skew-normal regression approach
Clough-Gorr, Kerri; Zwahlen, Marcel
2015-01-01
Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P
2016-02-01
Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. Observational cohort study. Five hospitals, from November 2008 until January 2013. Hospitalized ward patients None Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]). In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.
Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan
2006-01-01
We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
On the Latent Regression Model of Item Response Theory. Research Report. ETS RR-07-12
ERIC Educational Resources Information Center
Antal, Tamás
2007-01-01
Full account of the latent regression model for the National Assessment of Educational Progress is given. The treatment includes derivation of the EM algorithm, Newton-Raphson method, and the asymptotic standard errors. The paper also features the use of the adaptive Gauss-Hermite numerical integration method as a basic tool to evaluate…
Artes, Paul H; Crabb, David P
2010-01-01
To investigate why the specificity of the Moorfields Regression Analysis (MRA) of the Heidelberg Retina Tomograph (HRT) varies with disc size, and to derive accurate normative limits for neuroretinal rim area to address this problem. Two datasets from healthy subjects (Manchester, UK, n = 88; Halifax, Nova Scotia, Canada, n = 75) were used to investigate the physiological relationship between the optic disc and neuroretinal rim area. Normative limits for rim area were derived by quantile regression (QR) and compared with those of the MRA (derived by linear regression). Logistic regression analyses were performed to quantify the association between disc size and positive classifications with the MRA, as well as with the QR-derived normative limits. In both datasets, the specificity of the MRA depended on optic disc size. The odds of observing a borderline or outside-normal-limits classification increased by approximately 10% for each 0.1 mm(2) increase in disc area (P < 0.1). The lower specificity of the MRA with large optic discs could be explained by the failure of linear regression to model the extremes of the rim area distribution (observations far from the mean). In comparison, the normative limits predicted by QR were larger for smaller discs (less specific, more sensitive), and smaller for larger discs, such that false-positive rates became independent of optic disc size. Normative limits derived by quantile regression appear to remove the size-dependence of specificity with the MRA. Because quantile regression does not rely on the restrictive assumptions of standard linear regression, it may be a more appropriate method for establishing normative limits in other clinical applications where the underlying distributions are nonnormal or have nonconstant variance.
NASA Astrophysics Data System (ADS)
Hadley, Brian Christopher
This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.
Modelling and Closed-Loop System Identification of a Quadrotor-Based Aerial Manipulator
NASA Astrophysics Data System (ADS)
Dube, Chioniso; Pedro, Jimoh O.
2018-05-01
This paper presents the modelling and system identification of a quadrotor-based aerial manipulator. The aerial manipulator model is first derived analytically using the Newton-Euler formulation for the quadrotor and Recursive Newton-Euler formulation for the manipulator. The aerial manipulator is then simulated with the quadrotor under Proportional Derivative (PD) control, with the manipulator in motion. The simulation data is then used for system identification of the aerial manipulator. Auto Regressive with eXogenous inputs (ARX) models are obtained from the system identification for linear accelerations \\ddot{X} and \\ddot{Y} and yaw angular acceleration \\ddot{\\psi }. For linear acceleration \\ddot{Z}, and pitch and roll angular accelerations \\ddot{θ } and \\ddot{φ }, Auto Regressive Moving Average with eXogenous inputs (ARMAX) models are identified.
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Mapping of the DLQI scores to EQ-5D utility values using ordinal logistic regression.
Ali, Faraz Mahmood; Kay, Richard; Finlay, Andrew Y; Piguet, Vincent; Kupfer, Joerg; Dalgard, Florence; Salek, M Sam
2017-11-01
The Dermatology Life Quality Index (DLQI) and the European Quality of Life-5 Dimension (EQ-5D) are separate measures that may be used to gather health-related quality of life (HRQoL) information from patients. The EQ-5D is a generic measure from which health utility estimates can be derived, whereas the DLQI is a specialty-specific measure to assess HRQoL. To reduce the burden of multiple measures being administered and to enable a more disease-specific calculation of health utility estimates, we explored an established mathematical technique known as ordinal logistic regression (OLR) to develop an appropriate model to map DLQI data to EQ-5D-based health utility estimates. Retrospective data from 4010 patients were randomly divided five times into two groups for the derivation and testing of the mapping model. Split-half cross-validation was utilized resulting in a total of ten ordinal logistic regression models for each of the five EQ-5D dimensions against age, sex, and all ten items of the DLQI. Using Monte Carlo simulation, predicted health utility estimates were derived and compared against those observed. This method was repeated for both OLR and a previously tested mapping methodology based on linear regression. The model was shown to be highly predictive and its repeated fitting demonstrated a stable model using OLR as well as linear regression. The mean differences between OLR-predicted health utility estimates and observed health utility estimates ranged from 0.0024 to 0.0239 across the ten modeling exercises, with an average overall difference of 0.0120 (a 1.6% underestimate, not of clinical importance). This modeling framework developed in this study will enable researchers to calculate EQ-5D health utility estimates from a specialty-specific study population, reducing patient and economic burden.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Development of surrogate models for the prediction of the flow around an aircraft propeller
NASA Astrophysics Data System (ADS)
Salpigidou, Christina; Misirlis, Dimitris; Vlahostergios, Zinon; Yakinthos, Kyros
2018-05-01
In the present work, the derivation of two surrogate models (SMs) for modelling the flow around a propeller for small aircrafts is presented. Both methodologies use derived functions based on computations with the detailed propeller geometry. The computations were performed using k-ω shear stress transport for modelling turbulence. In the SMs, the modelling of the propeller was performed in a computational domain of disk-like geometry, where source terms were introduced in the momentum equations. In the first SM, the source terms were polynomial functions of swirl and thrust, mainly related to the propeller radius. In the second SM, regression analysis was used to correlate the source terms with the velocity distribution through the propeller. The proposed SMs achieved faster convergence, in relation to the detail model, by providing also results closer to the available operational data. The regression-based model was the most accurate and required less computational time for convergence.
NASA Astrophysics Data System (ADS)
Zhu, Ting-Lei; Zhao, Chang-Yin; Zhang, Ming-Jiang
2017-04-01
This paper aims to obtain an analytic approximation to the evolution of circular orbits governed by the Earth's J2 and the luni-solar gravitational perturbations. Assuming that the lunar orbital plane coincides with the ecliptic plane, Allan and Cook (Proc. R. Soc. A, Math. Phys. Eng. Sci. 280(1380):97, 1964) derived an analytic solution to the orbital plane evolution of circular orbits. Using their result as an intermediate solution, we establish an approximate analytic model with lunar orbital inclination and its node regression be taken into account. Finally, an approximate analytic expression is derived, which is accurate compared to the numerical results except for the resonant cases when the period of the reference orbit approximately equals the integer multiples (especially 1 or 2 times) of lunar node regression period.
Demidenko, Eugene
2017-09-01
The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
NASA Astrophysics Data System (ADS)
Safari, A.; Sohrabi, H.
2016-06-01
The role of forests as a reservoir for carbon has prompted the need for timely and reliable estimation of aboveground carbon stocks. Since measurement of aboveground carbon stocks of forests is a destructive, costly and time-consuming activity, aerial and satellite remote sensing techniques have gained many attentions in this field. Despite the fact that using aerial data for predicting aboveground carbon stocks has been proved as a highly accurate method, there are challenges related to high acquisition costs, small area coverage, and limited availability of these data. These challenges are more critical for non-commercial forests located in low-income countries. Landsat program provides repetitive acquisition of high-resolution multispectral data, which are freely available. The aim of this study was to assess the potential of multispectral Landsat 8 Operational Land Imager (OLI) derived texture metrics in quantifying aboveground carbon stocks of coppice Oak forests in Zagros Mountains, Iran. We used four different window sizes (3×3, 5×5, 7×7, and 9×9), and four different offsets ([0,1], [1,1], [1,0], and [1,-1]) to derive nine texture metrics (angular second moment, contrast, correlation, dissimilar, entropy, homogeneity, inverse difference, mean, and variance) from four bands (blue, green, red, and infrared). Totally, 124 sample plots in two different forests were measured and carbon was calculated using species-specific allometric models. Stepwise regression analysis was applied to estimate biomass from derived metrics. Results showed that, in general, larger size of window for deriving texture metrics resulted models with better fitting parameters. In addition, the correlation of the spectral bands for deriving texture metrics in regression models was ranked as b4>b3>b2>b5. The best offset was [1,-1]. Amongst the different metrics, mean and entropy were entered in most of the regression models. Overall, different models based on derived texture metrics were able to explain about half of the variation in aboveground carbon stocks. These results demonstrated that Landsat 8 derived texture metrics can be applied for mapping aboveground carbon stocks of coppice Oak Forests in large areas.
Estimation of stature from the foot and its segments in a sub-adult female population of North India
2011-01-01
Background Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. Methods The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. Results The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. Conclusions The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults. PMID:22104433
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam
2011-11-21
Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults.
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
NASA Technical Reports Server (NTRS)
Stolzer, Alan J.; Halford, Carl
2007-01-01
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Prediction of sweetness and amino acid content in soybean crops from hyperspectral imagery
NASA Astrophysics Data System (ADS)
Monteiro, Sildomar Takahashi; Minekawa, Yohei; Kosugi, Yukio; Akazawa, Tsuneya; Oda, Kunio
Hyperspectral image data provides a powerful tool for non-destructive crop analysis. This paper investigates a hyperspectral image data-processing method to predict the sweetness and amino acid content of soybean crops. Regression models based on artificial neural networks were developed in order to calculate the level of sucrose, glucose, fructose, and nitrogen concentrations, which can be related to the sweetness and amino acid content of vegetables. A performance analysis was conducted comparing regression models obtained using different preprocessing methods, namely, raw reflectance, second derivative, and principal components analysis. This method is demonstrated using high-resolution hyperspectral data of wavelengths ranging from the visible to the near infrared acquired from an experimental field of green vegetable soybeans. The best predictions were achieved using a nonlinear regression model of the second derivative transformed dataset. Glucose could be predicted with greater accuracy, followed by sucrose, fructose and nitrogen. The proposed method provides the possibility to provide relatively accurate maps predicting the chemical content of soybean crop fields.
NASA Astrophysics Data System (ADS)
Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping
2004-12-01
This work evaluates the feasibility of Fourier transform near infrared (FT-NIR) spectrometry for rapid determining the total soluble solids content and acidity of apple fruit. Intact apple fruit were measured by reflectance FT-NIR in 800-2500 nm range. FT-NIR models were developed based on partial least square (PLS) regression and principal component regress (PCR) with respect to the reflectance and its first derivative, the logarithms of the reflectance reciprocal and its second derivative. The above regression models, related the FT-NIR spectra to soluble solids content (SSC), titratable acidity (TA) and available acidity (pH). The best combination, based on the prediction results, was PLS models with respect to the logarithms of the reflectance reciprocal. Predictions with PLS models resulted standard errors of prediction (SEP) of 0.455, 0.044 and 0.068, and correlation coefficients of 0.968, 0.728 and 0.831 for SSC, TA and pH, respectively. It was concluded that by using the FT-NIR spectrometry measurement system, in the appropriate spectral range, it is possible to nondestructively assess the maturity factors of apple fruit.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan
2014-02-10
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.
Extrinsic local regression on manifold-valued data
Lin, Lizhen; St Thomas, Brian; Zhu, Hongtu; Dunson, David B.
2017-01-01
We propose an extrinsic regression framework for modeling data with manifold valued responses and Euclidean predictors. Regression with manifold responses has wide applications in shape analysis, neuroscience, medical imaging and many other areas. Our approach embeds the manifold where the responses lie onto a higher dimensional Euclidean space, obtains a local regression estimate in that space, and then projects this estimate back onto the image of the manifold. Outside the regression setting both intrinsic and extrinsic approaches have been proposed for modeling i.i.d manifold-valued data. However, to our knowledge our work is the first to take an extrinsic approach to the regression problem. The proposed extrinsic regression framework is general, computationally efficient and theoretically appealing. Asymptotic distributions and convergence rates of the extrinsic regression estimates are derived and a large class of examples are considered indicating the wide applicability of our approach. PMID:29225385
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Murakami, Takashi; Igarashi, Kentaro; Kawaguchi, Kei; Kiyuna, Tasuku; Zhang, Yong; Zhao, Ming; Hiroshima, Yukihiko; Nelson, Scott D; Dry, Sarah M; Li, Yunfeng; Yanagawa, Jane; Russell, Tara; Federman, Noah; Singh, Arun; Elliott, Irmina; Matsuyama, Ryusei; Chishima, Takashi; Tanaka, Kuniya; Endo, Itaru; Eilber, Fritz C; Hoffman, Robert M
2017-01-31
Osteosarcoma occurs mostly in children and young adults, who are treated with multiple agents in combination with limb-salvage surgery. However, the overall 5-year survival rate for patients with recurrent or metastatic osteosarcoma is 20-30% which has not improved significantly over 30 years. Refractory patients would benefit from precise individualized therapy. We report here that a patient-derived osteosarcoma growing in a subcutaneous nude-mouse model was regressed by tumor-targeting Salmonella typhimurium A1-R (S. typhimurium A1-R, p<0.001 compared to untreated control). The osteosarcoma was only partially sensitive to the molecular-targeting drug sorafenib, which did not arrest its growth. S. typhimurium A1-R was significantly more effective than sorafenib (P <0.001). S. typhimurium grew in the treated tumors and caused extensive necrosis of the tumor tissue. These data show that S. typhimurium A1-R is powerful therapy for an osteosarcoma patient-derived xenograft model.
Murakami, Takashi; Igarashi, Kentaro; Kawaguchi, Kei; Kiyuna, Tasuku; Zhang, Yong; Zhao, Ming; Hiroshima, Yukihiko; Nelson, Scott D.; Dry, Sarah M.; Li, Yunfeng; Yanagawa, Jane; Russell, Tara; Federman, Noah; Singh, Arun; Elliott, Irmina; Matsuyama, Ryusei; Chishima, Takashi; Tanaka, Kuniya; Endo, Itaru; Eilber, Fritz C.; Hoffman, Robert M.
2017-01-01
Osteosarcoma occurs mostly in children and young adults, who are treated with multiple agents in combination with limb-salvage surgery. However, the overall 5-year survival rate for patients with recurrent or metastatic osteosarcoma is 20-30% which has not improved significantly over 30 years. Refractory patients would benefit from precise individualized therapy. We report here that a patient-derived osteosarcoma growing in a subcutaneous nude-mouse model was regressed by tumor-targeting Salmonella typhimurium A1-R (S. typhimurium A1-R, p<0.001 compared to untreated control). The osteosarcoma was only partially sensitive to the molecular-targeting drug sorafenib, which did not arrest its growth. S. typhimurium A1-R was significantly more effective than sorafenib (P <0.001). S. typhimurium grew in the treated tumors and caused extensive necrosis of the tumor tissue. These data show that S. typhimurium A1-R is powerful therapy for an osteosarcoma patient-derived xenograft model. PMID:28030831
NASA Astrophysics Data System (ADS)
Rao, M.; Vuong, H.
2013-12-01
The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.
Wang, Dongliang; Xin, Xiaoping; Shao, Quanqin; Brolly, Matthew; Zhu, Zhiliang; Chen, Jin
2017-01-01
Accurate canopy structure datasets, including canopy height and fractional cover, are required to monitor aboveground biomass as well as to provide validation data for satellite remote sensing products. In this study, the ability of an unmanned aerial vehicle (UAV) discrete light detection and ranging (lidar) was investigated for modeling both the canopy height and fractional cover in Hulunber grassland ecosystem. The extracted mean canopy height, maximum canopy height, and fractional cover were used to estimate the aboveground biomass. The influences of flight height on lidar estimates were also analyzed. The main findings are: (1) the lidar-derived mean canopy height is the most reasonable predictor of aboveground biomass (R2 = 0.340, root-mean-square error (RMSE) = 81.89 g·m−2, and relative error of 14.1%). The improvement of multiple regressions to the R2 and RMSE values is unobvious when adding fractional cover in the regression since the correlation between mean canopy height and fractional cover is high; (2) Flight height has a pronounced effect on the derived fractional cover and details of the lidar data, but the effect is insignificant on the derived canopy height when the flight height is within the range (<100 m). These findings are helpful for modeling stable regressions to estimate grassland biomass using lidar returns. PMID:28106819
Wang, Dongliang; Xin, Xiaoping; Shao, Quanqin; Brolly, Matthew; Zhu, Zhiliang; Chen, Jin
2017-01-19
Accurate canopy structure datasets, including canopy height and fractional cover, are required to monitor aboveground biomass as well as to provide validation data for satellite remote sensing products. In this study, the ability of an unmanned aerial vehicle (UAV) discrete light detection and ranging (lidar) was investigated for modeling both the canopy height and fractional cover in Hulunber grassland ecosystem. The extracted mean canopy height, maximum canopy height, and fractional cover were used to estimate the aboveground biomass. The influences of flight height on lidar estimates were also analyzed. The main findings are: (1) the lidar-derived mean canopy height is the most reasonable predictor of aboveground biomass ( R ² = 0.340, root-mean-square error (RMSE) = 81.89 g·m -2 , and relative error of 14.1%). The improvement of multiple regressions to the R ² and RMSE values is unobvious when adding fractional cover in the regression since the correlation between mean canopy height and fractional cover is high; (2) Flight height has a pronounced effect on the derived fractional cover and details of the lidar data, but the effect is insignificant on the derived canopy height when the flight height is within the range (<100 m). These findings are helpful for modeling stable regressions to estimate grassland biomass using lidar returns.
Marabel, Miguel; Alvarez-Taboada, Flor
2013-01-01
Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082
Properties of added variable plots in Cox's regression model.
Lindkvist, M
2000-03-01
The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics.
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Robust inference under the beta regression model with application to health care studies.
Ghosh, Abhik
2017-01-01
Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Agrawal, Vijay K; Sharma, Ruchi; Khadikar, Padmakar V
2002-09-01
QSAR studies on modelling of biological activity (hCAI) for a series of ureido and thioureido derivatives of aromatic/heterocyclic sulfonamides have been made using a pool of topological indices. Regression analysis of the data showed that excellent results were obtained in multiparametric correlations upon introduction of indicator parameters. The predictive abilities of the models are discussed using cross-validation parameters.
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
A Continuous Threshold Expectile Model.
Zhang, Feipeng; Li, Qunhua
2017-12-01
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency. A weighted CUSUM type test statistic is proposed for the existence of a threshold at a given expectile, and its asymptotic properties are derived under both the null and the local alternative models. This test only requires fitting the model under the null hypothesis in the absence of a threshold, thus it is computationally more efficient than the likelihood-ratio type tests. Simulation studies show that the proposed estimators and test have desirable finite sample performance in both homoscedastic and heteroscedastic cases. The application of the proposed method on a Dutch growth data and a baseball pitcher salary data reveals interesting insights. The proposed method is implemented in the R package cthreshER .
NASA Astrophysics Data System (ADS)
Alloui, Mebarka; Belaidi, Salah; Othmani, Hasna; Jaidane, Nejm-Eddine; Hochlaf, Majdi
2018-03-01
We performed benchmark studies on the molecular geometry, electron properties and vibrational analysis of imidazole using semi-empirical, density functional theory and post Hartree-Fock methods. These studies validated the use of AM1 for the treatment of larger systems. Then, we treated the structural, physical and chemical relationships for a series of imidazole derivatives acting as angiotensin II AT1 receptor blockers using AM1. QSAR studies were done for these imidazole derivatives using a combination of various physicochemical descriptors. A multiple linear regression procedure was used to design the relationships between molecular descriptor and the activity of imidazole derivatives. Results validate the derived QSAR model.
Miller, Matthew P.; Johnson, Henry M.; Susong, David D.; Wolock, David M.
2015-01-01
Understanding how watershed characteristics and climate influence the baseflow component of stream discharge is a topic of interest to both the scientific and water management communities. Therefore, the development of baseflow estimation methods is a topic of active research. Previous studies have demonstrated that graphical hydrograph separation (GHS) and conductivity mass balance (CMB) methods can be applied to stream discharge data to estimate daily baseflow. While CMB is generally considered to be a more objective approach than GHS, its application across broad spatial scales is limited by a lack of high frequency specific conductance (SC) data. We propose a new method that uses discrete SC data, which are widely available, to estimate baseflow at a daily time step using the CMB method. The proposed approach involves the development of regression models that relate discrete SC concentrations to stream discharge and time. Regression-derived CMB baseflow estimates were more similar to baseflow estimates obtained using a CMB approach with measured high frequency SC data than were the GHS baseflow estimates at twelve snowmelt dominated streams and rivers. There was a near perfect fit between the regression-derived and measured CMB baseflow estimates at sites where the regression models were able to accurately predict daily SC concentrations. We propose that the regression-derived approach could be applied to estimate baseflow at large numbers of sites, thereby enabling future investigations of watershed and climatic characteristics that influence the baseflow component of stream discharge across large spatial scales.
Threshold regression to accommodate a censored covariate.
Qian, Jing; Chiou, Sy Han; Maye, Jacqueline E; Atem, Folefac; Johnson, Keith A; Betensky, Rebecca A
2018-06-22
In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN. © 2018, The International Biometric Society.
Numerical simulations for tumor and cellular immune system interactions in lung cancer treatment
NASA Astrophysics Data System (ADS)
Kolev, M.; Nawrocki, S.; Zubik-Kowal, B.
2013-06-01
We investigate a new mathematical model that describes lung cancer regression in patients treated by chemotherapy and radiotherapy. The model is composed of nonlinear integro-differential equations derived from the so-called kinetic theory for active particles and a new sink function is investigated according to clinical data from carcinoma planoepitheliale. The model equations are solved numerically and the data are utilized in order to find their unknown parameters. The results of the numerical experiments show a good correlation between the predicted and clinical data and illustrate that the mathematical model has potential to describe lung cancer regression.
Ji, Lei; Peters, Albert J.
2004-01-01
The relationship between vegetation and climate in the grassland and cropland of the northern US Great Plains was investigated with Normalized Difference Vegetation Index (NDVI) (1989–1993) images derived from the Advanced Very High Resolution Radiometer (AVHRR), and climate data from automated weather stations. The relationship was quantified using a spatial regression technique that adjusts for spatial autocorrelation inherent in these data. Conventional regression techniques used frequently in previous studies are not adequate, because they are based on the assumption of independent observations. Six climate variables during the growing season; precipitation, potential evapotranspiration, daily maximum and minimum air temperature, soil temperature, solar irradiation were regressed on NDVI derived from a 10-km weather station buffer. The regression model identified precipitation and potential evapotranspiration as the most significant climatic variables, indicating that the water balance is the most important factor controlling vegetation condition at an annual timescale. The model indicates that 46% and 24% of variation in NDVI is accounted for by climate in grassland and cropland, respectively, indicating that grassland vegetation has a more pronounced response to climate variation than cropland. Other factors contributing to NDVI variation include environmental factors (soil, groundwater and terrain), human manipulation of crops, and sensor variation.
Applicability of Cameriere's and Drusini's age estimation methods to a sample of Turkish adults.
Hatice, Boyacioglu Dogru; Nihal, Avcu; Nursel, Akkaya; Humeyra Ozge, Yilanci; Goksuluk, Dincer
2017-10-01
The aim of this study was to investigate the applicability of Drusini's and Cameriere's methods to a sample of Turkish people. Panoramic images of 200 individuals were allocated into two groups as study and test groups and examined by two observers. Tooth coronal indexes (TCI), which is the ratio between coronal pulp cavity height and crown height, were calculated in the mandibular first and second premolars and molars. Pulp/tooth area ratios (ARs) were calculated in the maxillary and mandibular canine teeth. Study group measurements were used to derive a regression model. Test group measurements were used to evaluate the accuracy of the regression model. Pearson's correlation coefficients and regression analysis were used. The correlations between TCIs and age were -0.230, -0.301, -0.344 and -0.257 for mandibular first premolar, second premolar, first molar and second molar, respectively. Those for the maxillary canine (MX) and mandibular canine (MN) ARs were -0.716 and -0.514, respectively. The MX ARs were used to build the linear regression model that explained 51.2% of the total variation, with a standard error of 9.23 years. The mean error of the estimates in test group was 8 years and age of 64% of the individuals were estimated with an error of <±10 years which is acceptable in forensic age prediction. The low correlation coefficients between age and TCI indicate that Drusini's method was not applicable to the estimation of age in a Turkish population. Using Cameriere's method, we derived a regression model.
Testing homogeneity in Weibull-regression models.
Bolfarine, Heleno; Valença, Dione M
2005-10-01
In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
NASA Astrophysics Data System (ADS)
Shao, G.; Gallion, J.; Fei, S.
2016-12-01
Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Hanley, James A
2008-01-01
Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.
Confounder summary scores when comparing the effects of multiple drug exposures.
Cadarette, Suzanne M; Gagne, Joshua J; Solomon, Daniel H; Katz, Jeffrey N; Stürmer, Til
2010-01-01
Little information is available comparing methods to adjust for confounding when considering multiple drug exposures. We compared three analytic strategies to control for confounding based on measured variables: conventional multivariable, exposure propensity score (EPS), and disease risk score (DRS). Each method was applied to a dataset (2000-2006) recently used to examine the comparative effectiveness of four drugs. The relative effectiveness of risedronate, nasal calcitonin, and raloxifene in preventing non-vertebral fracture, were each compared to alendronate. EPSs were derived both by using multinomial logistic regression (single model EPS) and by three separate logistic regression models (separate model EPS). DRSs were derived and event rates compared using Cox proportional hazard models. DRSs derived among the entire cohort (full cohort DRS) was compared to DRSs derived only among the referent alendronate (unexposed cohort DRS). Less than 8% deviation from the base estimate (conventional multivariable) was observed applying single model EPS, separate model EPS or full cohort DRS. Applying the unexposed cohort DRS when background risk for fracture differed between comparison drug exposure cohorts resulted in -7 to + 13% deviation from our base estimate. With sufficient numbers of exposed and outcomes, either conventional multivariable, EPS or full cohort DRS may be used to adjust for confounding to compare the effects of multiple drug exposures. However, our data also suggest that unexposed cohort DRS may be problematic when background risks differ between referent and exposed groups. Further empirical and simulation studies will help to clarify the generalizability of our findings.
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Yu, S; Gao, S; Gan, Y; Zhang, Y; Ruan, X; Wang, Y; Yang, L; Shi, J
2016-04-01
Quantitative structure-property relationship modelling can be a valuable alternative method to replace or reduce experimental testing. In particular, some endpoints such as octanol-water (KOW) and organic carbon-water (KOC) partition coefficients of polychlorinated biphenyls (PCBs) are easier to predict and various models have been already developed. In this paper, two different methods, which are multiple linear regression based on the descriptors generated using Dragon software and hologram quantitative structure-activity relationships, were employed to predict suspended particulate matter (SPM) derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of 209 PCBs. The predictive ability of the derived models was validated using a test set. The performances of all these models were compared with EPI Suite™ software. The results indicated that the proposed models were robust and satisfactory, and could provide feasible and promising tools for the rapid assessment of the SPM derived log KOC and generator column, shake flask and slow stirring method derived log KOW values of PCBs.
ERIC Educational Resources Information Center
Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.
2011-01-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…
ERIC Educational Resources Information Center
Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.
2007-01-01
Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
Yadav, Dharmendra Kumar; Kalani, Komal; Khan, Feroz; Srivastava, Santosh Kumar
2013-12-01
For the prediction of anticancer activity of glycyrrhetinic acid (GA-1) analogs against the human lung cancer cell line (A-549), a QSAR model was developed by forward stepwise multiple linear regression methodology. The regression coefficient (r(2)) and prediction accuracy (rCV(2)) of the QSAR model were taken 0.94 and 0.82, respectively in terms of correlation. The QSAR study indicates that the dipole moments, size of smallest ring, amine counts, hydroxyl and nitro functional groups are correlated well with cytotoxic activity. The docking studies showed high binding affinity of the predicted active compounds against the lung cancer target EGFR. These active glycyrrhetinic acid derivatives were then semi-synthesized, characterized and in-vitro tested for anticancer activity. The experimental results were in agreement with the predicted values and the ethyl oxalyl derivative of GA-1 (GA-3) showed equal cytotoxic activity to that of standard anticancer drug paclitaxel.
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Min, Seung Nam; Park, Se Jin; Kim, Dong Joon; Subramaniyam, Murali; Lee, Kyung-Sun
2018-01-01
Stroke is the second leading cause of death worldwide and remains an important health burden both for the individuals and for the national healthcare systems. Potentially modifiable risk factors for stroke include hypertension, cardiac disease, diabetes, and dysregulation of glucose metabolism, atrial fibrillation, and lifestyle factors. We aimed to derive a model equation for developing a stroke pre-diagnosis algorithm with the potentially modifiable risk factors. We used logistic regression for model derivation, together with data from the database of the Korea National Health Insurance Service (NHIS). We reviewed the NHIS records of 500,000 enrollees. For the regression analysis, data regarding 367 stroke patients were selected. The control group consisted of 500 patients followed up for 2 consecutive years and with no history of stroke. We developed a logistic regression model based on information regarding several well-known modifiable risk factors. The developed model could correctly discriminate between normal subjects and stroke patients in 65% of cases. The model developed in the present study can be applied in the clinical setting to estimate the probability of stroke in a year and thus improve the stroke prevention strategies in high-risk patients. The approach used to develop the stroke prevention algorithm can be applied for developing similar models for the pre-diagnosis of other diseases. © 2018 S. Karger AG, Basel.
Survival curves of Listeria monocytogenes in chorizos modeled with artificial neural networks.
Hajmeer, M; Basheer, I; Cliver, D O
2006-09-01
Using artificial neural networks (ANNs), a highly accurate model was developed to simulate survival curves of Listeria monocytogenes in chorizos as affected by the initial water activity (a(w0)) of the sausage formulation, temperature (T), and air inflow velocity (F) where the sausages are stored. The ANN-based survival model (R(2)=0.970) outperformed the regression-based cubic model (R(2)=0.851), and as such was used to derive other models (using regression) that allow prediction of the times needed to drop count by 1, 2, 3, and 4 logs (i.e., nD-values, n=1, 2, 3, 4). The nD-value regression models almost perfectly predicted the various times derived from a number of simulated survival curves exhibiting a wide variety of the operating conditions (R(2)=0.990-0.995). The nD-values were found to decrease with decreasing a(w0), and increasing T and F. The influence of a(w0) on nD-values seems to become more significant at some critical value of a(w0), below which the variation is negligible (0.93 for 1D-value, 0.90 for 2D-value, and <0.85 for 3D- and 4D-values). There is greater influence of storage T and F on 3D- and 4D-values than on 1D- and 2D-values.
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Goltz, Annemarie; Janowitz, Deborah; Hannemann, Anke; Nauck, Matthias; Hoffmann, Johanna; Seyfart, Tom; Völzke, Henry; Terock, Jan; Grabe, Hans Jörgen
2018-06-19
Depression and obesity are widespread and closely linked. Brain-derived neurotrophic factor (BDNF) and vitamin D are both assumed to be associated with depression and obesity. Little is known about the interplay between vitamin D and BDNF. We explored the putative associations and interactions between serum BDNF and vitamin D levels with depressive symptoms and abdominal obesity in a large population-based cohort. Data were obtained from the population-based Study of Health in Pomerania (SHIP)-Trend (n = 3,926). The associations of serum BDNF and vitamin D levels with depressive symptoms (measured using the Patient Health Questionnaire) were assessed with binary and multinomial logistic regression models. The associations of serum BDNF and vitamin D levels with obesity (measured by the waist-to-hip ratio [WHR]) were assessed with binary logistic and linear regression models with restricted cubic splines. Logistic regression models revealed inverse associations of vitamin D with depression (OR = 0.966; 95% CI 0.951-0.981) and obesity (OR = 0.976; 95% CI 0.967-0.985). No linear association of serum BDNF with depression or obesity was found. However, linear regression models revealed a U-shaped association of BDNF with WHR (p < 0.001). Vitamin D was inversely associated with depression and obesity. BDNF was associated with abdominal obesity, but not with depression. At the population level, our results support the relevant roles of vitamin D and BDNF in mental and physical health-related outcomes. © 2018 S. Karger AG, Basel.
A mass transfer model of ethanol emission from thin layers of corn silage
USDA-ARS?s Scientific Manuscript database
A mass transfer model of ethanol emission from thin layers of corn silage was developed and validated. The model was developed based on data from wind tunnel experiments conducted at different temperatures and air velocities. Multiple regression analysis was used to derive an equation that related t...
Data-driven discovery of partial differential equations.
Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan
2017-04-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
Using a GIS model to assess terrestrial salamander response to alternative forest management plans
Eric J. Gustafson; Nathan L. Murphy; Thomas R. Crow
2001-01-01
A GIS model predicting the spatial distribution of terrestrial salamander abundance based on topography and forest age was developed using parameters derived from the literature. The model was tested by sampling salamander abundance across the full range of site conditions used in the model. A regression of the predictions of our GIS model against these sample data...
Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.
Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen
2017-01-01
Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Trans-dimensional joint inversion of seabed scattering and reflection data.
Steininger, Gavin; Dettmer, Jan; Dosso, Stan E; Holland, Charles W
2013-03-01
This paper examines joint inversion of acoustic scattering and reflection data to resolve seabed interface roughness parameters (spectral strength, exponent, and cutoff) and geoacoustic profiles. Trans-dimensional (trans-D) Bayesian sampling is applied with both the number of sediment layers and the order (zeroth or first) of auto-regressive parameters in the error model treated as unknowns. A prior distribution that allows fluid sediment layers over an elastic basement in a trans-D inversion is derived and implemented. Three cases are considered: Scattering-only inversion, joint scattering and reflection inversion, and joint inversion with the trans-D auto-regressive error model. Including reflection data improves the resolution of scattering and geoacoustic parameters. The trans-D auto-regressive model further improves scattering resolution and correctly differentiates between strongly and weakly correlated residual errors.
NASA Astrophysics Data System (ADS)
Novitski, Linda Nicole
Accurate and cost-effective assessment of water quality is necessary for proper management and restoration of inland water bodies susceptible to algal bloom conditions. Landsat and MODIS satellite images were used to create chlorophyll and Secchi depth predictive models for algal assessment of Great Lakes and other lakes of the United States. Boosted regression tree (BRT) models using satellite imagery are both easy to use and can have high predictive performance. BRT models inferred chlorophyll and Secchi depth more accurately than linear regression models for all study locations. Inferred chlorophyll of inner Saginaw Bay was subsequently used in ecological models to help understand the ecological drivers of algal blooms in this ecosystem. For small lakes (non-Great Lakes), the best national Landsat model for ln-transformed chlorophyll was the BRT model and had a cross-validation R 2 of 0.44 and a 0.76 ln-transformed mug/L RMSE. The best national Landsat model for Secchi depth was also a BRT model that had an adjusted R 2 of 0.52 and a 0.80 m RMSE. We assessed the applicability of the national chlorophyll model for ecological analysis by comparing the total phosphorus- chlorophyll relationship with chlorophyll determined from sampling or remote sensing, which showed the total phosphorus- chlorophyll relationship had an adjusted R2 = 0.58 and 1.02 ln-transformed microg/L RMSE with sampled chlorophyll versus an adjusted R2 = 0.56 and 1.04 ln-transformed mug/L RMSE with chlorophyll determined by the boosted regression tree remote sensing model. For Great Lakes models, the MODIS BRT model predicted chlorophyll most accurately of the three BRT models and compared well to other models in the literature. BRT models for Landsat ETM+ and TM more accurately predicted chlorophyll than the MSS model and all Landsat models had favorable results when compared to the literature. BRT chlorophyll predictive models are useful in helping to understand historical, long-term chlorophyll trends and to inform us of how climate change may alter ecosystems in the future. In inner Saginaw Bay, annual average and upper quartile Landsat-derived chlorophyll decreased from 7.44 to 6.62 and 8.38 to 7.38 mug/L between 1973-1982, and annual upper quartile of 8-day phosphorus loads increased from 5.29 to 6.79 kg between 1973-2012. Simple linear and multiple regression models and Wilcoxon rank test results for MODIS and Landsat-derived chlorophyll indicate that distance from the Saginaw River mouth influences chlorophyll concentration in Saginaw Bay; Landsat-derived surface water temperature and phosphorus loads to a lesser extent. Mixed-effect models for MODIS and Landsat-derived chlorophyll were related to chlorophyll better than simple linear or multiple regressions, with random effects of pixel and sample date contributing substantially to predictive power (NSE=0.35-70), though phosphorus loads, distance to Saginaw River mouth, and water were significant fixed effects in most models. Water quality changes in Saginaw Bay between 1972-2012 were influenced by phosphorus loading and distance to the Saginaw River's mouth. Landsat and MODIS imagery are complementary platforms because of the long history of Landsat operation and the finer spectral resolution and image frequency of MODIS. Remote sensing water quality assessment tools can be valuable for limnological study, ecological assessment, and water resource management.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Estimating standard errors in feature network models.
Frank, Laurence E; Heiser, Willem J
2007-05-01
Feature network models are graphical structures that represent proximity data in a discrete space while using the same formalism that is the basis of least squares methods employed in multidimensional scaling. Existing methods to derive a network model from empirical data only give the best-fitting network and yield no standard errors for the parameter estimates. The additivity properties of networks make it possible to consider the model as a univariate (multiple) linear regression problem with positivity restrictions on the parameters. In the present study, both theoretical and empirical standard errors are obtained for the constrained regression parameters of a network model with known features. The performance of both types of standard error is evaluated using Monte Carlo techniques.
Li, Aihua; Dhakal, Shital; Glenn, Nancy F.; Spaete, Luke P.; Shinneman, Douglas; Pilliod, David S.; Arkle, Robert; McIlroy, Susan
2017-01-01
Our study objectives were to model the aboveground biomass in a xeric shrub-steppe landscape with airborne light detection and ranging (Lidar) and explore the uncertainty associated with the models we created. We incorporated vegetation vertical structure information obtained from Lidar with ground-measured biomass data, allowing us to scale shrub biomass from small field sites (1 m subplots and 1 ha plots) to a larger landscape. A series of airborne Lidar-derived vegetation metrics were trained and linked with the field-measured biomass in Random Forests (RF) regression models. A Stepwise Multiple Regression (SMR) model was also explored as a comparison. Our results demonstrated that the important predictors from Lidar-derived metrics had a strong correlation with field-measured biomass in the RF regression models with a pseudo R2 of 0.76 and RMSE of 125 g/m2 for shrub biomass and a pseudo R2 of 0.74 and RMSE of 141 g/m2 for total biomass, and a weak correlation with field-measured herbaceous biomass. The SMR results were similar but slightly better than RF, explaining 77–79% of the variance, with RMSE ranging from 120 to 129 g/m2 for shrub and total biomass, respectively. We further explored the computational efficiency and relative accuracies of using point cloud and raster Lidar metrics at different resolutions (1 m to 1 ha). Metrics derived from the Lidar point cloud processing led to improved biomass estimates at nearly all resolutions in comparison to raster-derived Lidar metrics. Only at 1 m were the results from the point cloud and raster products nearly equivalent. The best Lidar prediction models of biomass at the plot-level (1 ha) were achieved when Lidar metrics were derived from an average of fine resolution (1 m) metrics to minimize boundary effects and to smooth variability. Overall, both RF and SMR methods explained more than 74% of the variance in biomass, with the most important Lidar variables being associated with vegetation structure and statistical measures of this structure (e.g., standard deviation of height was a strong predictor of biomass). Using our model results, we developed spatially-explicit Lidar estimates of total and shrub biomass across our study site in the Great Basin, U.S.A., for monitoring and planning in this imperiled ecosystem.
Vector autoregressive models: A Gini approach
NASA Astrophysics Data System (ADS)
Mussard, Stéphane; Ndiaye, Oumar Hamady
2018-02-01
In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Statistical modeling of landslide hazard using GIS
Peter V. Gorsevski; Randy B. Foltz; Paul E. Gessler; Terrance W. Cundy
2001-01-01
A model for spatial prediction of landslide hazard was applied to a watershed affected by landslide events that occurred during the winter of 1995-96, following heavy rains, and snowmelt. Digital elevation data with 22.86 m x 22.86 m resolution was used for deriving topographic attributes used for modeling. The model is based on the combination of logistic regression...
Rubio-Álvarez, Ana; Molina-Alarcón, Milagros; Arias-Arias, Ángel; Hernández-Martínez, Antonio
2018-03-01
postpartum haemorrhage is one of the leading causes of maternal morbidity and mortality worldwide. Despite the use of uterotonics agents as preventive measure, it remains a challenge to identify those women who are at increased risk of postpartum bleeding. to develop and to validate a predictive model to assess the risk of excessive bleeding in women with vaginal birth. retrospective cohorts study. "Mancha-Centro Hospital" (Spain). the elaboration of the predictive model was based on a derivation cohort consisting of 2336 women between 2009 and 2011. For validation purposes, a prospective cohort of 953 women between 2013 and 2014 were employed. Women with antenatal fetal demise, multiple pregnancies and gestations under 35 weeks were excluded METHODS: we used a multivariate analysis with binary logistic regression, Ridge Regression and areas under the Receiver Operating Characteristic curves to determine the predictive ability of the proposed model. there was 197 (8.43%) women with excessive bleeding in the derivation cohort and 63 (6.61%) women in the validation cohort. Predictive factors in the final model were: maternal age, primiparity, duration of the first and second stages of labour, neonatal birth weight and antepartum haemoglobin levels. Accordingly, the predictive ability of this model in the derivation cohort was 0.90 (95% CI: 0.85-0.93), while it remained 0.83 (95% CI: 0.74-0.92) in the validation cohort. this predictive model is proved to have an excellent predictive ability in the derivation cohort, and its validation in a latter population equally shows a good ability for prediction. This model can be employed to identify women with a higher risk of postpartum haemorrhage. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tracking and Explaining Credit-Hour Completion
ERIC Educational Resources Information Center
Kwenda, Maxwell Ndigume
2014-01-01
This study highlights factors associated with changes in earned hours for two cohorts of incoming freshmen during their first year. The objectives of this study are twofold: (a) to derive model(s) regressing the cumulative hours earned and differential hours earned on student demographic, socioeconomic, and academic characteristics; and (b) to…
Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha
2012-05-01
Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Efficient Regressions via Optimally Combining Quantile Information*
Zhao, Zhibiao; Xiao, Zhijie
2014-01-01
We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Table Rock Lake Water-Clarity Assessment Using Landsat Thematic Mapper Satellite Data
Krizanich, Gary; Finn, Michael P.
2009-01-01
Water quality of Table Rock Lake in southwestern Missouri is assessed using Landsat Thematic Mapper satellite data. A pilot study uses multidate satellite image scenes in conjunction with physical measurements of secchi disk transparency collected by the Lakes of Missouri Volunteer Program to construct a regression model used to estimate water clarity. The natural log of secchi disk transparency is the dependent variable in the regression and the independent variables are Thematic Mapper band 1 (blue) reflectance and a ratio of the band 1 and band 3 (red) reflectance. The regression model can be used to reliably predict water clarity anywhere within the lake. A pixel-level lake map of predicted water clarity or computed trophic state can be produced from the model output. Information derived from this model can be used by water-resource managers to assess water quality and evaluate effects of changes in the watershed on water quality.
Estimating the global incidence of traumatic spinal cord injury.
Fitzharris, M; Cripps, R A; Lee, B B
2014-02-01
Population modelling--forecasting. To estimate the global incidence of traumatic spinal cord injury (TSCI). An initiative of the International Spinal Cord Society (ISCoS) Prevention Committee. Regression techniques were used to derive regional and global estimates of TSCI incidence. Using the findings of 31 published studies, a regression model was fitted using a known number of TSCI cases as the dependent variable and the population at risk as the single independent variable. In the process of deriving TSCI incidence, an alternative TSCI model was specified in an attempt to arrive at an optimal way of estimating the global incidence of TSCI. The global incidence of TSCI was estimated to be 23 cases per 1,000,000 persons in 2007 (179,312 cases per annum). World Health Organization's regional results are provided. Understanding the incidence of TSCI is important for health service planning and for the determination of injury prevention priorities. In the absence of high-quality epidemiological studies of TSCI in each country, the estimation of TSCI obtained through population modelling can be used to overcome known deficits in global spinal cord injury (SCI) data. The incidence of TSCI is context specific, and an alternative regression model demonstrated how TSCI incidence estimates could be improved with additional data. The results highlight the need for data standardisation and comprehensive reporting of national level TSCI data. A step-wise approach from the collation of conventional epidemiological data through to population modelling is suggested.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
A baseline-free procedure for transformation models under interval censorship.
Gu, Ming Gao; Sun, Liuquan; Zuo, Guoxin
2005-12-01
An important property of Cox regression model is that the estimation of regression parameters using the partial likelihood procedure does not depend on its baseline survival function. We call such a procedure baseline-free. Using marginal likelihood, we show that an baseline-free procedure can be derived for a class of general transformation models under interval censoring framework. The baseline-free procedure results a simplified and stable computation algorithm for some complicated and important semiparametric models, such as frailty models and heteroscedastic hazard/rank regression models, where the estimation procedures so far available involve estimation of the infinite dimensional baseline function. A detailed computational algorithm using Markov Chain Monte Carlo stochastic approximation is presented. The proposed procedure is demonstrated through extensive simulation studies, showing the validity of asymptotic consistency and normality. We also illustrate the procedure with a real data set from a study of breast cancer. A heuristic argument showing that the score function is a mean zero martingale is provided.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Posa, Mihalj; Pilipović, Ana; Lalić, Mladena; Popović, Jovan
2011-02-15
Linear dependence between temperature (t) and retention coefficient (k, reversed phase HPLC) of bile acids is obtained. Parameters (a, intercept and b, slope) of the linear function k=f(t) highly correlate with bile acids' structures. Investigated bile acids form linear congeneric groups on a principal component (calculated from k=f(t)) score plot that are in accordance with conformations of the hydroxyl and oxo groups in a bile acid steroid skeleton. Partition coefficient (K(p)) of nitrazepam in bile acids' micelles is investigated. Nitrazepam molecules incorporated in micelles show modified bioavailability (depo effect, higher permeability, etc.). Using multiple linear regression method QSAR models of nitrazepams' partition coefficient, K(p) are derived on the temperatures of 25°C and 37°C. For deriving linear regression models on both temperatures experimentally obtained lipophilicity parameters are included (PC1 from data k=f(t)) and in silico descriptors of the shape of a molecule while on the higher temperature molecular polarisation is introduced. This indicates the fact that the incorporation mechanism of nitrazepam in BA micelles changes on the higher temperatures. QSAR models are derived using partial least squares method as well. Experimental parameters k=f(t) are shown to be significant predictive variables. Both QSAR models are validated using cross validation and internal validation method. PLS models have slightly higher predictive capability than MLR models. Copyright © 2010 Elsevier B.V. All rights reserved.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Modeling of Engine Parameters for Condition-Based Maintenance of the MTU Series 2000 Diesel Engine
2016-09-01
are suitable. To model the behavior of the engine, an autoregressive distributed lag (ARDL) time series model of engine speed and exhaust gas... time series model of engine speed and exhaust gas temperature is derived. The lag length for ARDL is determined by whitening of residuals using the...15 B. REGRESSION ANALYSIS ....................................................................15 1. Time Series Analysis
Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal
2014-01-01
We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
NASA Astrophysics Data System (ADS)
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx concentrations from an independent data set from the EPA's Air Quality System (AQS) and MESA Air fixed site monitors, and performed very well (R²=0.82).
Shuguang Liua; Pamela Anderson; Guoyi Zhoud; Boone Kauffman; Flint Hughes; David Schimel; Vicente Watson; Joseph Tosi
2008-01-01
Objectively assessing the performance of a model and deriving model parameter values from observations are critical and challenging in landscape to regional modeling. In this paper, we applied a nonlinear inversion technique to calibrate the ecosystem model CENTURY against carbon (C) and nitrogen (N) stock measurements collected from 39 mature tropical forest sites in...
NASA Astrophysics Data System (ADS)
Wang, Liang-Jie; Sawada, Kazuhide; Moriguchi, Shuji
2013-01-01
To mitigate the damage caused by landslide disasters, different mathematical models have been applied to predict landslide spatial distribution characteristics. Although some researchers have achieved excellent results around the world, few studies take the spatial resolution of the database into account. Four types of digital elevation model (DEM) ranging from 2 to 20 m derived from light detection and ranging technology to analyze landslide susceptibility in Mizunami City, Gifu Prefecture, Japan, are presented. Fifteen landslide-causative factors are considered using a logistic-regression approach to create models for landslide potential analysis. Pre-existing landslide bodies are used to evaluate the performance of the four models. The results revealed that the 20-m model had the highest classification accuracy (71.9%), whereas the 2-m model had the lowest value (68.7%). In the 2-m model, 89.4% of the landslide bodies fit in the medium to very high categories. For the 20-m model, only 83.3% of the landslide bodies were concentrated in the medium to very high classes. When the cell size decreases from 20 to 2 m, the area under the relative operative characteristic increases from 0.68 to 0.77. Therefore, higher-resolution DEMs would provide better results for landslide-susceptibility mapping.
Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.
2009-01-01
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343
Mixed and Mixture Regression Models for Continuous Bounded Responses Using the Beta Distribution
ERIC Educational Resources Information Center
Verkuilen, Jay; Smithson, Michael
2012-01-01
Doubly bounded continuous data are common in the social and behavioral sciences. Examples include judged probabilities, confidence ratings, derived proportions such as percent time on task, and bounded scale scores. Dependent variables of this kind are often difficult to analyze using normal theory models because their distributions may be quite…
Kernel analysis of partial least squares (PLS) regression models.
Shinzawa, Hideyuki; Ritthiruangdej, Pitiporn; Ozaki, Yukihiro
2011-05-01
An analytical technique based on kernel matrix representation is demonstrated to provide further chemically meaningful insight into partial least squares (PLS) regression models. The kernel matrix condenses essential information about scores derived from PLS or principal component analysis (PCA). Thus, it becomes possible to establish the proper interpretation of the scores. A PLS model for the total nitrogen (TN) content in multiple Thai fish sauces is built with a set of near-infrared (NIR) transmittance spectra of the fish sauce samples. The kernel analysis of the scores effectively reveals that the variation of the spectral feature induced by the change in protein content is substantially associated with the total water content and the protein hydration. Kernel analysis is also carried out on a set of time-dependent infrared (IR) spectra representing transient evaporation of ethanol from a binary mixture solution of ethanol and oleic acid. A PLS model to predict the elapsed time is built with the IR spectra and the kernel matrix is derived from the scores. The detailed analysis of the kernel matrix provides penetrating insight into the interaction between the ethanol and the oleic acid.
Data-driven discovery of partial differential equations
Rudy, Samuel H.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan
2017-01-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg–de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable. PMID:28508044
Chad Babcock; Andrew O. Finley; John B. Bradford; Randy Kolka; Richard Birdsey; Michael G. Ryan
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both...
Brazil soybean yield covariance model
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate soybean yields for the seven soybean-growing states of Brazil. The meteorological data of these seven states were pooled and the years 1975 to 1980 were used to model since there was no technological trend in the yields during these years. Predictor variables were derived from monthly total precipitation and monthly average temperature.
Liu, Xiu-ying; Wang, Li; Chang, Qing-rui; Wang, Xiao-xing; Shang, Yan
2015-07-01
Wuqi County of Shaanxi Province, where the vegetation recovering measures have been carried out for years, was taken as the study area. A total of 100 loess samples from 24 different profiles were collected. Total nitrogen (TN) and alkali hydrolysable nitrogen (AHN) contents of the soil samples were analyzed, and the soil samples were scanned in the visible/near-infrared (VNIR) region of 350-2500 nm in the laboratory. The calibration models were developed between TN and AHN contents and VNIR values based on correlation analysis (CA) and partial least squares regression (PLS). Independent samples validated the calibration models. The results indicated that the optimum model for predicting TN of loess was established by using first derivative of reflectance. The best model for predicting AHN of loess was established by using normal derivative spectra. The optimum TN model could effectively predict TN in loess from 0 to 40 cm, but the optimum AHN model could only roughly predict AHN at the same depth. This study provided a good method for rapidly predicting TN of loess where vegetation recovering measures have been adopted, but prediction of AHN needs to be further studied.
Alexeeff, Stacey E.; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A.
2016-01-01
Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1km x 1km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R2 yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with greater than 0.9 out-of-sample R2 yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the standard errors. Land use regression models performed better in chronic effects simulations. These results can help researchers when interpreting health effect estimates in these types of studies. PMID:24896768
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md
2015-05-01
Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Campbell, J Elliott; Moen, Jeremie C; Ney, Richard A; Schnoor, Jerald L
2008-03-01
Estimates of forest soil organic carbon (SOC) have applications in carbon science, soil quality studies, carbon sequestration technologies, and carbon trading. Forest SOC has been modeled using a regression coefficient methodology that applies mean SOC densities (mass/area) to broad forest regions. A higher resolution model is based on an approach that employs a geographic information system (GIS) with soil databases and satellite-derived landcover images. Despite this advancement, the regression approach remains the basis of current state and federal level greenhouse gas inventories. Both approaches are analyzed in detail for Wisconsin forest soils from 1983 to 2001, applying rigorous error-fixing algorithms to soil databases. Resulting SOC stock estimates are 20% larger when determined using the GIS method rather than the regression approach. Average annual rates of increase in SOC stocks are 3.6 and 1.0 million metric tons of carbon per year for the GIS and regression approaches respectively.
Nonparametric instrumental regression with non-convex constraints
NASA Astrophysics Data System (ADS)
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Schistosomiasis Breeding Environment Situation Analysis in Dongting Lake Area
NASA Astrophysics Data System (ADS)
Li, Chuanrong; Jia, Yuanyuan; Ma, Lingling; Liu, Zhaoyan; Qian, Yonggang
2013-01-01
Monitoring environmental characteristics, such as vegetation, soil moisture et al., of Oncomelania hupensis (O. hupensis)’ spatial/temporal distribution is of vital importance to the schistosomiasis prevention and control. In this study, the relationship between environmental factors derived from remotely sensed data and the density of O. hupensis was analyzed by a multiple linear regression model. Secondly, spatial analysis of the regression residual was investigated by the semi-variogram method. Thirdly, spatial analysis of the regression residual and the multiple linear regression model were both employed to estimate the spatial variation of O. hupensis density. Finally, the approach was used to monitor and predict the spatial and temporal variations of oncomelania of Dongting Lake region, China. And the areas of potential O. hupensis habitats were predicted and the influence of Three Gorges Dam (TGB)project on the density of O. hupensis was analyzed.
Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Kyri A; Shi, Ying; Christensen, Dane T
Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less
Modeling Stationary Lithium-Ion Batteries for Optimization and Predictive Control: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raszmann, Emma; Baker, Kyri; Shi, Ying
Accurately modeling stationary battery storage behavior is crucial to understand and predict its limitations in demand-side management scenarios. In this paper, a lithium-ion battery model was derived to estimate lifetime and state-of-charge for building-integrated use cases. The proposed battery model aims to balance speed and accuracy when modeling battery behavior for real-time predictive control and optimization. In order to achieve these goals, a mixed modeling approach was taken, which incorporates regression fits to experimental data and an equivalent circuit to model battery behavior. A comparison of the proposed battery model output to actual data from the manufacturer validates the modelingmore » approach taken in the paper. Additionally, a dynamic test case demonstrates the effects of using regression models to represent internal resistance and capacity fading.« less
Prasad, Kailash; Jadhav, Ashok
2016-01-01
Atherosclerosis is the primary cause of coronary artery disease, heart attack, strokes, and peripheral vascular disease. Alternative/complimentary medicines, although are unacceptable by medical community, may be of great help in suppression, slowing of progression and regression of atherosclerosis. Numerous natural products are in use for therapy in spite of lack of evidence. This paper discusses the basic mechanism of atherosclerosis, risk factors for atherosclerosis, and prevention, slowing of progression and regression of atherosclerosis with flaxseed-derived secoisolariciresinol diglucoside (SDG). SDG content of flaxseed varies from 6mg/g to 18 mg/g. Flaxseed is the richest source of SDG. SDG possesses antioxidant, antihypertensive, antidiabetic, hypolipidemic, anti-inflammatory and antiatherogenic activities. SDG content of some commonly used food has been described. SDG in very low dose (15 mg/ kg) suppressed the development of hypercholesterolemic atherosclerosis by 73 % and this effect was associated with reduction in serum total cholesterol, LDL-C, and oxidative stress, and an increase in the levels HDL-C. A summary of the effects of flaxseed and its components on hypercholesterolemic atherosclerosis has been provided. Reduction in hypercholesterolemic atherosclerosis by flaxseed, CDC-flaxseed, flaxseed oil, flax lignan complex and SDG are 46 %, 69 %, 0 %, 34 % and 73 % respectively in dietary cholesterol -induced rabbit model of atherosclerosis. SDG slows the progression of atherosclerosis in animal model. Long-term use of SDG regresses hypercholesterolemic atherosclerosis. It is interesting that regular diet following high cholesterol diet accelerates in this animal model of atherosclerosis. In conclusion SDG suppresses, slow the progression and regresses the atherosclerosis. It could serve as an alternative medicine for the prevention, slowing of progression and regression of atherosclerosis and hence for the treatment of coronary artery disease, stroke and peripheral arterial vascular diseases.
Zhang, Xu; Zhang, Mei-Jie; Fine, Jason
2012-01-01
With competing risks failure time data, one often needs to assess the covariate effects on the cumulative incidence probabilities. Fine and Gray proposed a proportional hazards regression model to directly model the subdistribution of a competing risk. They developed the estimating procedure for right-censored competing risks data, based on the inverse probability of censoring weighting. Right-censored and left-truncated competing risks data sometimes occur in biomedical researches. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with right-censored and left-truncated data. We adopt a new weighting technique to estimate the parameters in this model. We have derived the large sample properties of the proposed estimators. To illustrate the application of the new method, we analyze the failure time data for children with acute leukemia. In this example, the failure times for children who had bone marrow transplants were left truncated. PMID:21557288
Prognostic model for survival in patients with early stage cervical cancer.
Biewenga, Petra; van der Velden, Jacobus; Mol, Ben Willem J; Stalpers, Lukas J A; Schilthuis, Marten S; van der Steeg, Jan Willem; Burger, Matthé P M; Buist, Marrije R
2011-02-15
In the management of early stage cervical cancer, knowledge about the prognosis is critical. Although many factors have an impact on survival, their relative importance remains controversial. This study aims to develop a prognostic model for survival in early stage cervical cancer patients and to reconsider grounds for adjuvant treatment. A multivariate Cox regression model was used to identify the prognostic weight of clinical and histological factors for disease-specific survival (DSS) in 710 consecutive patients who had surgery for early stage cervical cancer (FIGO [International Federation of Gynecology and Obstetrics] stage IA2-IIA). Prognostic scores were derived by converting the regression coefficients for each prognostic marker and used in a score chart. The discriminative capacity was expressed as the area under the curve (AUC) of the receiver operating characteristic. The 5-year DSS was 92%. Tumor diameter, histological type, lymph node metastasis, depth of stromal invasion, lymph vascular space invasion, and parametrial extension were independently associated with DSS and were included in a Cox regression model. This prognostic model, corrected for the 9% overfit shown by internal validation, showed a fair discriminative capacity (AUC, 0.73). The derived score chart predicting 5-year DSS showed a good discriminative capacity (AUC, 0.85). In patients with early stage cervical cancer, DSS can be predicted with a statistical model. Models, such as that presented here, should be used in clinical trials on the effects of adjuvant treatments in high-risk early cervical cancer patients, both to stratify and to include patients. Copyright © 2010 American Cancer Society.
An empirical model for estimating annual consumption by freshwater fish populations
Liao, H.; Pierce, C.L.; Larscheid, J.G.
2005-01-01
Population consumption is an important process linking predator populations to their prey resources. Simple tools are needed to enable fisheries managers to estimate population consumption. We assembled 74 individual estimates of annual consumption by freshwater fish populations and their mean annual population size, 41 of which also included estimates of mean annual biomass. The data set included 14 freshwater fish species from 10 different bodies of water. From this data set we developed two simple linear regression models predicting annual population consumption. Log-transformed population size explained 94% of the variation in log-transformed annual population consumption. Log-transformed biomass explained 98% of the variation in log-transformed annual population consumption. We quantified the accuracy of our regressions and three alternative consumption models as the mean percent difference from observed (bioenergetics-derived) estimates in a test data set. Predictions from our population-size regression matched observed consumption estimates poorly (mean percent difference = 222%). Predictions from our biomass regression matched observed consumption reasonably well (mean percent difference = 24%). The biomass regression was superior to an alternative model, similar in complexity, and comparable to two alternative models that were more complex and difficult to apply. Our biomass regression model, log10(consumption) = 0.5442 + 0.9962??log10(biomass), will be a useful tool for fishery managers, enabling them to make reasonably accurate annual population consumption predictions from mean annual biomass estimates. ?? Copyright by the American Fisheries Society 2005.
Quantifying scaling effects on satellite-derived forest area estimates for the conterminous USA
Daolan Zheng; L.S. Heath; M.J. Ducey; J.E. Smith
2009-01-01
We quantified the scaling effects on forest area estimates for the conterminous USA using regression analysis and the National Land Cover Dataset 30m satellite-derived maps in 2001 and 1992. The original data were aggregated to: (1) broad cover types (forest vs. non-forest); and (2) coarser resolutions (1km and 10 km). Standard errors of the model estimates were 2.3%...
Phenomapping of rangelands in South Africa using time series of RapidEye data
NASA Astrophysics Data System (ADS)
Parplies, André; Dubovyk, Olena; Tewes, Andreas; Mund, Jan-Peter; Schellberg, Jürgen
2016-12-01
Phenomapping is an approach which allows the derivation of spatial patterns of vegetation phenology and rangeland productivity based on time series of vegetation indices. In our study, we propose a new spatial mapping approach which combines phenometrics derived from high resolution (HR) satellite time series with spatial logistic regression modeling to discriminate land management systems in rangelands. From the RapidEye time series for selected rangelands in South Africa, we calculated bi-weekly noise reduced Normalized Difference Vegetation Index (NDVI) images. For the growing season of 20112012, we further derived principal phenology metrics such as start, end and length of growing season and related phenological variables such as amplitude, left derivative and small integral of the NDVI curve. We then mapped these phenometrics across two different tenure systems, communal and commercial, at the very detailed spatial resolution of 5 m. The result of a binary logistic regression (BLR) has shown that the amplitude and the left derivative of the NDVI curve were statistically significant. These indicators are useful to discriminate commercial from communal rangeland systems. We conclude that phenomapping combined with spatial modeling is a powerful tool that allows efficient aggregation of phenology and productivity metrics for spatially explicit analysis of the relationships of crop phenology with site conditions and management. This approach has particular potential for disaggregated and patchy environments such as in farming systems in semi-arid South Africa, where phenology varies considerably among and within years. Further, we see a strong perspective for phenomapping to support spatially explicit modelling of vegetation.
A Bayesian model averaging method for the derivation of reservoir operating rules
NASA Astrophysics Data System (ADS)
Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai
2015-09-01
Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.
Geodesic least squares regression for scaling studies in magnetic confinement fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert
In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less
NASA Astrophysics Data System (ADS)
Green, Rebecca E.; Gould, Richard W., Jr.; Ko, Dong S.
2008-06-01
We developed statistically-based, optical models to estimate tripton (sediment/detrital) and colored dissolved organic matter (CDOM) absorption coefficients ( a sd, a g) from physical hydrographic and atmospheric properties. The models were developed for northern Gulf of Mexico shelf waters using multi-year satellite and physical data. First, empirical algorithms for satellite-derived a sd and a g were developed, based on comparison with a large data set of cruise measurements from northern Gulf shelf waters; these algorithms were then applied to a time series of ocean color (SeaWiFS) satellite imagery for 2002-2005. Unique seasonal timing was observed in satellite-derived optical properties, with a sd peaking most often in fall/winter on the shelf, in contrast to summertime peaks observed in a g. Next, the satellite-derived values were coupled with the physical data to form multiple regression models. A suite of physical forcing variables were tested for inclusion in the models: discharge from the Mississippi River and Mobile Bay, Alabama; gridded fields for winds, precipitation, solar radiation, sea surface temperature and height (SST, SSH); and modeled surface salinity and currents (Navy Coastal Ocean Model, NCOM). For satellite-derived a sd and a g time series (2002-2004), correlation and stepwise regression analyses revealed the most important physical forcing variables. Over our region of interest, the best predictors of tripton absorption were wind speed, river discharge, and SST, whereas dissolved absorption was best predicted by east-west wind speed, river discharge, and river discharge lagged by 1 month. These results suggest the importance of vertical mixing (as a function of winds and thermal stratification) in controlling a sd distribution patterns over large regions of the shelf, in comparison to advection as the most important control on a g. The multiple linear regression models for estimating a sd and a g were applied on a pixel-by-pixel basis and results were compared to monthly SeaWiFS composite imagery. The models performed well in resolving seasonal and interannual optical variability in model development years (2002-2004) (mean error of 32% for a sd and 29% for a g) and in predicting shelfwide optical patterns in a year independent of model development (2005; mean error of 41% for a sd and 46% for a g). The models provide insight into the dominant processes controlling optical distributions in this region, and they can be used to predict the optical fields from the physical properties at monthly timescales.
Forecast model applications of retrieved three dimensional liquid water fields
NASA Technical Reports Server (NTRS)
Raymond, William H.; Olson, William S.
1990-01-01
Forecasts are made for tropical storm Emily using heating rates derived from the SSM/I physical retrievals described in chapters 2 and 3. Average values of the latent heating rates from the convective and stratiform cloud simulations, used in the physical retrieval, are obtained for individual 1.1 km thick vertical layers. Then, the layer-mean latent heating rates are regressed against the slant path-integrated liquid and ice precipitation water contents to determine the best fit two parameter regression coefficients for each layer. The regression formulae and retrieved precipitation water contents are utilized to infer the vertical distribution of heating rates for forecast model applications. In the forecast model, diabatic temperature contributions are calculated and used in a diabatic initialization, or in a diabatic initialization combined with a diabatic forcing procedure. Our forecasts show that the time needed to spin-up precipitation processes in tropical storm Emily is greatly accelerated through the application of the data.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Jacob Strunk; Hailemariam Temesgen; Hans-Erik Andersen; James P. Flewelling; Lisa Madsen
2012-01-01
Using lidar in an area-based model-assisted approach to forest inventory has the potential to increase estimation precision for some forest inventory variables. This study documents the bias and precision of a model-assisted (regression estimation) approach to forest inventory with lidar-derived auxiliary variables relative to lidar pulse density and the number of...
ERIC Educational Resources Information Center
Owen, Steven V.; Feldhusen, John F.
This study compares the effectiveness of three models of multivariate prediction for academic success in identifying the criterion variance of achievement in nursing education. The first model involves the use of an optimum set of predictors and one equation derived from a regression analysis on first semester grade average in predicting the…
Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data
NASA Astrophysics Data System (ADS)
Sampson, Paul D.; Szpiro, Adam A.; Sheppard, Lianne; Lindström, Johan; Kaufman, Joel D.
2011-11-01
Statistical analyses of health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in "land use" regression models. More recently these spatial regression models have accounted for spatial correlation structure in combining monitoring data with land use covariates. We present a flexible spatio-temporal modeling framework and pragmatic, multi-step estimation procedure that accommodates essentially arbitrary patterns of missing data with respect to an ideally complete space by time matrix of observations on a network of monitoring sites. The methodology incorporates a model for smooth temporal trends with coefficients varying in space according to Partial Least Squares regressions on a large set of geographic covariates and nonstationary modeling of spatio-temporal residuals from these regressions. This work was developed to provide spatial point predictions of PM 2.5 concentrations for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) using irregular monitoring data derived from the AQS regulatory monitoring network and supplemental short-time scale monitoring campaigns conducted to better predict intra-urban variation in air quality. We demonstrate the interpretation and accuracy of this methodology in modeling data from 2000 through 2006 in six U.S. metropolitan areas and establish a basis for likelihood-based estimation.
NASA Technical Reports Server (NTRS)
Ratnayake, Nalin A.; Waggoner, Erin R.; Taylor, Brian R.
2011-01-01
The problem of parameter estimation on hybrid-wing-body aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aerodynamic control effectors that act in coplanar motion. This adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of flight and simulation data must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, time-decorrelation techniques are applied to a model structure selected through stepwise regression for simulated and flight-generated lateral-directional parameter estimation data. A virtual effector model that uses mathematical abstractions to describe the multi-axis effects of clamshell surfaces is developed and applied. Comparisons are made between time history reconstructions and observed data in order to assess the accuracy of the regression model. The Cram r-Rao lower bounds of the estimated parameters are used to assess the uncertainty of the regression model relative to alternative models. Stepwise regression was found to be a useful technique for lateral-directional model design for hybrid-wing-body aircraft, as suggested by available flight data. Based on the results of this study, linear regression parameter estimation methods using abstracted effectors are expected to perform well for hybrid-wing-body aircraft properly equipped for the task.
Beukinga, Roelof J; Hulshoff, Jan B; van Dijk, Lisanne V; Muijs, Christina T; Burgerhof, Johannes G M; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Slump, Cornelis H; Mul, Véronique E M; Plukker, John Th M
2017-05-01
Adequate prediction of tumor response to neoadjuvant chemoradiotherapy (nCRT) in esophageal cancer (EC) patients is important in a more personalized treatment. The current best clinical method to predict pathologic complete response is SUV max in 18 F-FDG PET/CT imaging. To improve the prediction of response, we constructed a model to predict complete response to nCRT in EC based on pretreatment clinical parameters and 18 F-FDG PET/CT-derived textural features. Methods: From a prospectively maintained single-institution database, we reviewed 97 consecutive patients with locally advanced EC and a pretreatment 18 F-FDG PET/CT scan between 2009 and 2015. All patients were treated with nCRT (carboplatin/paclitaxel/41.4 Gy) followed by esophagectomy. We analyzed clinical, geometric, and pretreatment textural features extracted from both 18 F-FDG PET and CT. The current most accurate prediction model with SUV max as a predictor variable was compared with 6 different response prediction models constructed using least absolute shrinkage and selection operator regularized logistic regression. Internal validation was performed to estimate the model's performances. Pathologic response was defined as complete versus incomplete response (Mandard tumor regression grade system 1 vs. 2-5). Results: Pathologic examination revealed 19 (19.6%) complete and 78 (80.4%) incomplete responders. Least absolute shrinkage and selection operator regularization selected the clinical parameters: histologic type and clinical T stage, the 18 F-FDG PET-derived textural feature long run low gray level emphasis, and the CT-derived textural feature run percentage. Introducing these variables to a logistic regression analysis showed areas under the receiver-operating-characteristic curve (AUCs) of 0.78 compared with 0.58 in the SUV max model. The discrimination slopes were 0.17 compared with 0.01, respectively. After internal validation, the AUCs decreased to 0.74 and 0.54, respectively. Conclusion: The predictive values of the constructed models were superior to the standard method (SUV max ). These results can be considered as an initial step in predicting tumor response to nCRT in locally advanced EC. Further research in refining the predictive value of these models is needed to justify omission of surgery. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
NASA Astrophysics Data System (ADS)
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Allan, Bruce D; Hassan, Hala; Ieong, Alvin
2015-05-01
To describe and evaluate a new multiple regression-derived nomogram for myopic wavefront laser in situ keratomileusis (LASIK). Moorfields Eye Hospital, London, United Kingdom. Prospective comparative case series. Multiple regression modeling was used to derive a simplified formula for adjusting attempted spherical correction in myopic LASIK. An adaptation of Thibos' power vector method was then applied to derive adjustments to attempted cylindrical correction in eyes with 1.0 diopter (D) or more of preoperative cylinder. These elements were combined in a new nomogram (nomogram II). The 3-month refractive results for myopic wavefront LASIK (spherical equivalent ≤11.0 D; cylinder ≤4.5 D) were compared between 299 consecutive eyes treated using the earlier nomogram (nomogram I) in 2009 and 2010 and 414 eyes treated using nomogram II in 2011 and 2012. There was no significant difference in treatment accuracy (variance in the postoperative manifest refraction spherical equivalent error) between nomogram I and nomogram II (P = .73, Bartlett test). Fewer patients treated with nomogram II had more than 0.5 D of residual postoperative astigmatism (P = .0001, Fisher exact test). There was no significant coupling between adjustments to the attempted cylinder and the achieved sphere (P = .18, t test). Discarding marginal influences from a multiple regression-derived nomogram for myopic wavefront LASIK had no clinically significant effect on treatment accuracy. Thibos' power vector method can be used to guide adjustments to the treatment cylinder alongside nomograms designed to optimize postoperative spherical equivalent results in myopic LASIK. mentioned. Copyright © 2015 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.
QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.
Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan
2013-03-01
A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.
Self-consistent asset pricing models
NASA Astrophysics Data System (ADS)
Malevergne, Y.; Sornette, D.
2007-08-01
We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Security of statistical data bases: invasion of privacy through attribute correlational modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palley, M.A.
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
NASA Astrophysics Data System (ADS)
Madonna, Erica; Ginsbourger, David; Martius, Olivia
2018-05-01
In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
NASA Astrophysics Data System (ADS)
Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza
2018-03-01
In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.
Zlotnik, Alexander; Alfaro, Miguel Cuchí; Pérez, María Carmen Pérez; Gallardo-Antolín, Ascensión; Martínez, Juan Manuel Montero
2016-05-01
The usage of decision support tools in emergency departments, based on predictive models, capable of estimating the probability of admission for patients in the emergency department may give nursing staff the possibility of allocating resources in advance. We present a methodology for developing and building one such system for a large specialized care hospital using a logistic regression and an artificial neural network model using nine routinely collected variables available right at the end of the triage process.A database of 255.668 triaged nonobstetric emergency department presentations from the Ramon y Cajal University Hospital of Madrid, from January 2011 to December 2012, was used to develop and test the models, with 66% of the data used for derivation and 34% for validation, with an ordered nonrandom partition. On the validation dataset areas under the receiver operating characteristic curve were 0.8568 (95% confidence interval, 0.8508-0.8583) for the logistic regression model and 0.8575 (95% confidence interval, 0.8540-0. 8610) for the artificial neural network model. χ Values for Hosmer-Lemeshow fixed "deciles of risk" were 65.32 for the logistic regression model and 17.28 for the artificial neural network model. A nomogram was generated upon the logistic regression model and an automated software decision support system with a Web interface was built based on the artificial neural network model.
Qidwai, Tabish; Yadav, Dharmendra K; Khan, Feroz; Dhawan, Sangeeta; Bhakuni, R S
2012-01-01
This work presents the development of quantitative structure activity relationship (QSAR) model to predict the antimalarial activity of artemisinin derivatives. The structures of the molecules are represented by chemical descriptors that encode topological, geometric, and electronic structure features. Screening through QSAR model suggested that compounds A24, A24a, A53, A54, A62 and A64 possess significant antimalarial activity. Linear model is developed by the multiple linear regression method to link structures to their reported antimalarial activity. The correlation in terms of regression coefficient (r(2)) was 0.90 and prediction accuracy of model in terms of cross validation regression coefficient (rCV(2)) was 0.82. This study indicates that chemical properties viz., atom count (all atoms), connectivity index (order 1, standard), ring count (all rings), shape index (basic kappa, order 2), and solvent accessibility surface area are well correlated with antimalarial activity. The docking study showed high binding affinity of predicted active compounds against antimalarial target Plasmepsins (Plm-II). Further studies for oral bioavailability, ADMET and toxicity risk assessment suggest that compound A24, A24a, A53, A54, A62 and A64 exhibits marked antimalarial activity comparable to standard antimalarial drugs. Later one of the predicted active compound A64 was chemically synthesized, structure elucidated by NMR and in vivo tested in multidrug resistant strain of Plasmodium yoelii nigeriensis infected mice. The experimental results obtained agreed well with the predicted values.
Austin, Peter C; Walraven, Carl van
2011-10-01
Logistic regression models that incorporated age, sex, and indicator variables for the Johns Hopkins' Aggregated Diagnosis Groups (ADGs) categories have been shown to accurately predict all-cause mortality in adults. To develop 2 different point-scoring systems using the ADGs. The Mortality Risk Score (MRS) collapses age, sex, and the ADGs to a single summary score that predicts the annual risk of all-cause death in adults. The ADG Score derives weights for the individual ADG diagnosis groups. : Retrospective cohort constructed using population-based administrative data. All 10,498,413 residents of Ontario, Canada, between the age of 20 and 100 years who were alive on their birthday in 2007, participated in this study. Participants were randomly divided into derivation and validation samples. : Death within 1 year. In the derivation cohort, the MRS ranged from -21 to 139 (median value 29, IQR 17 to 44). In the validation group, a logistic regression model with the MRS as the sole predictor significantly predicted the risk of 1-year mortality with a c-statistic of 0.917. A regression model with age, sex, and the ADG Score has similar performance. Both methods accurately predicted the risk of 1-year mortality across the 20 vigintiles of risk. The MRS combined values for a person's age, sex, and the John Hopkins ADGs to accurately predict 1-year mortality in adults. The ADG Score is a weighted score representing the presence or absence of the 32 ADG diagnosis groups. These scores will facilitate health services researchers conducting risk adjustment using administrative health care databases.
Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David
2017-10-01
Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P < .001). Means of software machine-derived values differed significantly from actual PLT yield, 4.72 × 10 11 vs.6.12 × 10 11 , respectively, (P < .001). The following equation was developed to adjust these values: actual PLT yield= 0.221 + (1.254 × theoretical platelet yield). ROC curve model showed an optimal apheresis device software prediction cut-off of 4.65 × 10 11 to obtain a DP, with a sensitivity of 82.2%, specificity of 93.3%, and an area under the curve (AUC) of 0.909. Trima Accel v6.0 software consistently underestimated PLT yields. Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.
Savjani, Ricky R; Taylor, Brian A; Acion, Laura; Wilde, Elisabeth A; Jorge, Ricardo E
2017-11-15
Finding objective and quantifiable imaging markers of mild traumatic brain injury (TBI) has proven challenging, especially in the military population. Changes in cortical thickness after injury have been reported in animals and in humans, but it is unclear how these alterations manifest in the chronic phase, and it is difficult to characterize accurately with imaging. We used cortical thickness measures derived from Advanced Normalization Tools (ANTs) to predict a continuous demographic variable: age. We trained four different regression models (linear regression, support vector regression, Gaussian process regression, and random forests) to predict age from healthy control brains from publicly available datasets (n = 762). We then used these models to predict brain age in military Service Members with TBI (n = 92) and military Service Members without TBI (n = 34). Our results show that all four models overpredicted age in Service Members with TBI, and the predicted age difference was significantly greater compared with military controls. These data extend previous civilian findings and show that cortical thickness measures may reveal an association of accelerated changes over time with military TBI.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
NASA Astrophysics Data System (ADS)
Zhang, Changjiang; Dai, Lijie; Ma, Leiming; Qian, Jinfang; Yang, Bo
2017-10-01
An objective technique is presented for estimating tropical cyclone (TC) innercore two-dimensional (2-D) surface wind field structure using infrared satellite imagery and machine learning. For a TC with eye, the eye contour is first segmented by a geodesic active contour model, based on which the eye circumference is obtained as the TC eye size. A mathematical model is then established between the eye size and the radius of maximum wind obtained from the past official TC report to derive the 2-D surface wind field within the TC eye. Meanwhile, the composite information about the latitude of TC center, surface maximum wind speed, TC age, and critical wind radii of 34- and 50-kt winds can be combined to build another mathematical model for deriving the innercore wind structure. After that, least squares support vector machine (LSSVM), radial basis function neural network (RBFNN), and linear regression are introduced, respectively, in the two mathematical models, which are then tested with sensitivity experiments on real TC cases. Verification shows that the innercore 2-D surface wind field structure estimated by LSSVM is better than that of RBFNN and linear regression.
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali, Md Nasim; Majumdar, K; Ray, D P
2014-07-01
This pilot study compared penalized spline regression (PSR) and random forest (RF) regression using visible and near-infrared diffuse reflectance spectroscopy (VisNIR DRS) derived spectra of 164 petroleum contaminated soils after two different spectral pretreatments [first derivative (FD) and standard normal variate (SNV) followed by detrending] for rapid quantification of soil petroleum contamination. Additionally, a new analytical approach was proposed for the recovery of the pure spectral and concentration profiles of n-hexane present in the unresolved mixture of petroleum contaminated soils using multivariate curve resolution alternating least squares (MCR-ALS). The PSR model using FD spectra (r(2) = 0.87, RMSE = 0.580 log10 mg kg(-1), and residual prediction deviation = 2.78) outperformed all other models tested. Quantitative results obtained by MCR-ALS for n-hexane in presence of interferences (r(2) = 0.65 and RMSE 0.261 log10 mg kg(-1)) were comparable to those obtained using FD (PSR) model. Furthermore, MCR ALS was able to recover pure spectra of n-hexane. Copyright © 2014 Elsevier Ltd. All rights reserved.
Peeters, Yvette; Boersma, Sandra N; Koopman, Hendrik M
2008-01-01
Background Aim of this study is to further explore predictors of health related quality of life in children with asthma using factors derived from to the extended stress-coping model. While the stress-coping model has often been used as a frame of reference in studying health related quality of life in chronic illness, few have actually tested the model in children with asthma. Method In this survey study data were obtained by means of self-report questionnaires from seventy-eight children with asthma and their parents. Based on data derived from these questionnaires the constructs of the extended stress-coping model were assessed, using regression analysis and path analysis. Results The results of both regression analysis and path analysis reveal tentative support for the proposed relationships between predictors and health related quality of life in the stress-coping model. Moreover, as indicated in the stress-coping model, HRQoL is only directly predicted by coping. Both coping strategies 'emotional reaction' (significantly) and 'avoidance' are directly related to HRQoL. Conclusion In children with asthma, the extended stress-coping model appears to be a useful theoretical framework for understanding the impact of the illness on their quality of life. Consequently, the factors suggested by this model should be taken into account when designing optimal psychosocial-care interventions. PMID:18366753
Applicability of linear regression equation for prediction of chlorophyll content in rice leaves
NASA Astrophysics Data System (ADS)
Li, Yunmei
2005-09-01
A modeling approach is used to assess the applicability of the derived equations which are capable to predict chlorophyll content of rice leaves at a given view direction. Two radiative transfer models, including PROSPECT model operated at leaf level and FCR model operated at canopy level, are used in the study. The study is consisted of three steps: (1) Simulation of bidirectional reflectance from canopy with different leaf chlorophyll contents, leaf-area-index (LAI) and under storey configurations; (2) Establishment of prediction relations of chlorophyll content by stepwise regression; and (3) Assessment of the applicability of these relations. The result shows that the accuracy of prediction is affected by different under storey configurations and, however, the accuracy tends to be greatly improved with increase of LAI.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
Aylward, Lesa L; Brunet, Robert C; Starr, Thomas B; Carrier, Gaétan; Delzell, Elizabeth; Cheng, Hong; Beall, Colleen
2005-08-01
Recent studies demonstrating a concentration dependence of elimination of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) suggest that previous estimates of exposure for occupationally exposed cohorts may have underestimated actual exposure, resulting in a potential overestimate of the carcinogenic potency of TCDD in humans based on the mortality data for these cohorts. Using a database on U.S. chemical manufacturing workers potentially exposed to TCDD compiled by the National Institute for Occupational Safety and Health (NIOSH), we evaluated the impact of using a concentration- and age-dependent elimination model (CADM) (Aylward et al., 2005) on estimates of serum lipid area under the curve (AUC) for the NIOSH cohort. These data were used previously by Steenland et al. (2001) in combination with a first-order elimination model with an 8.7-year half-life to estimate cumulative serum lipid concentration (equivalent to AUC) for these workers for use in cancer dose-response assessment. Serum lipid TCDD measurements taken in 1988 for a subset of the cohort were combined with the NIOSH job exposure matrix and work histories to estimate dose rates per unit of exposure score. We evaluated the effect of choices in regression model (regression on untransformed vs. ln-transformed data and inclusion of a nonzero regression intercept) as well as the impact of choices of elimination models and parameters on estimated AUCs for the cohort. Central estimates for dose rate parameters derived from the serum-sampled subcohort were applied with the elimination models to time-specific exposure scores for the entire cohort to generate AUC estimates for all cohort members. Use of the CADM resulted in improved model fits to the serum sampling data compared to the first-order models. Dose rates varied by a factor of 50 among different combinations of elimination model, parameter sets, and regression models. Use of a CADM results in increases of up to five-fold in AUC estimates for the more highly exposed members of the cohort compared to estimates obtained using the first-order model with 8.7-year half-life. This degree of variation in the AUC estimates for this cohort would affect substantially the cancer potency estimates derived from the mortality data from this cohort. Such variability and uncertainty in the reconstructed serum lipid AUC estimates for this cohort, depending on elimination model, parameter set, and regression model, have not been described previously and are critical components in evaluating the dose-response data from the occupationally exposed populations.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Wang, Hongqing; Hladik, C.M.; Huang, W.; Milla, K.; Edmiston, L.; Harwell, M.A.; Schalles, J.F.
2010-01-01
Apalachicola Bay, Florida, accounts for 90% of Florida's and 10% of the nation's eastern oyster (Crassostrea virginica) harvesting. Chlorophyll-a concentration and total suspended solids (TSS) are two important water quality variables, among other environmental factors such as salinity, for eastern oyster production in Apalachicola Bay. In this research, we developed regression models of the relationships between the reflectance of the Moderate-Resolution Imaging Spectroradiometer (MODIS) Terra 250 m data and the two water quality variables based on the Bay-wide field data collected during 14-17 October 2002, a relatively dry period, and 3-5 April 2006, a relatively wet period, respectively. Then we selected the best regression models (highest coefficient of determination, R2) to derive Bay-wide maps of chlorophylla concentration and TSS for the two periods. The MODIS-derived maps revealed large spatial and temporal variations in chlorophylla concentration and TSS across the entire Apalachicola Bay. ?? 2010 Taylor & Francis.
Brakebill, J.W.; Preston, S.D.
2003-01-01
The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived from the DEMs. This network improves upon existing digital stream data by increasing the level of spatial detail and providing consistency between the reach locations and topography. The hydrologic network also aids in illustrating the spatial patterns of predicted nutrient loads and sources contributed locally to each stream, and the percentages of nutrient load that reach Chesapeake Bay.
Gurung, Arun Bahadur; Aguan, Kripamoy; Mitra, Sivaprasad; Bhattacharjee, Atanu
2017-06-01
In Alzheimer's disease (AD), the level of Acetylcholine (ACh) neurotransmitter is reduced. Since Acetylcholinesterase (AChE) cleaves ACh, inhibitors of AChE are very much sought after for AD treatment. The side effects of current inhibitors necessitate development of newer AChE inhibitors. Isoalloxazine derivatives have proved to be promising (AChE) inhibitors. However, their structure-activity relationship studies have not been reported till date. In the present work, various quantitative structure-activity relationship (QSAR) building methods such as multiple linear regression (MLR), partial least squares ,and principal component regression were employed to derive 3D-QSAR models using steric and electrostatic field descriptors. Statistically significant model was obtained using MLR coupled with stepwise selection method having r 2 = .9405, cross validated r 2 (q 2 ) = .6683, and a high predictability (pred_r 2 = .6206 and standard error, pred_r 2 se = .2491). Steric and electrostatic contribution plot revealed three electrostatic fields E_496, E_386 and E_577 and one steric field S_60 contributing towards biological activity. A ligand-based 3D-pharmacophore model was generated consisting of eight pharmacophore features. Isoalloxazine derivatives were docked against human AChE, which revealed critical residues implicated in hydrogen bonds as well as hydrophobic interactions. The binding modes of docked complexes (AChE_IA1 and AChE_IA14) were validated by molecular dynamics simulation which showed their stable trajectories in terms of root mean square deviation and molecular mechanics/Poisson-Boltzmann surface area binding free energy analysis revealed key residues contributing significantly to overall binding energy. The present study may be useful in the design of more potent Isoalloxazine derivatives as AChE inhibitors.
NASA Astrophysics Data System (ADS)
Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.
2017-06-01
The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
Aulenbach, Brent T.
2013-01-01
A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.
Validating a Predictive Model of Acute Advanced Imaging Biomarkers in Ischemic Stroke.
Bivard, Andrew; Levi, Christopher; Lin, Longting; Cheng, Xin; Aviv, Richard; Spratt, Neil J; Lou, Min; Kleinig, Tim; O'Brien, Billy; Butcher, Kenneth; Zhang, Jingfen; Jannes, Jim; Dong, Qiang; Parsons, Mark
2017-03-01
Advanced imaging to identify tissue pathophysiology may provide more accurate prognostication than the clinical measures used currently in stroke. This study aimed to derive and validate a predictive model for functional outcome based on acute clinical and advanced imaging measures. A database of prospectively collected sub-4.5 hour patients with ischemic stroke being assessed for thrombolysis from 5 centers who had computed tomographic perfusion and computed tomographic angiography before a treatment decision was assessed. Individual variable cut points were derived from a classification and regression tree analysis. The optimal cut points for each assessment variable were then used in a backward logic regression to predict modified Rankin scale (mRS) score of 0 to 1 and 5 to 6. The variables remaining in the models were then assessed using a receiver operating characteristic curve analysis. Overall, 1519 patients were included in the study, 635 in the derivation cohort and 884 in the validation cohort. The model was highly accurate at predicting mRS score of 0 to 1 in all patients considered for thrombolysis therapy (area under the curve [AUC] 0.91), those who were treated (AUC 0.88) and those with recanalization (AUC 0.89). Next, the model was highly accurate at predicting mRS score of 5 to 6 in all patients considered for thrombolysis therapy (AUC 0.91), those who were treated (0.89) and those with recanalization (AUC 0.91). The odds ratio of thrombolysed patients who met the model criteria achieving mRS score of 0 to 1 was 17.89 (4.59-36.35, P <0.001) and for mRS score of 5 to 6 was 8.23 (2.57-26.97, P <0.001). This study has derived and validated a highly accurate model at predicting patient outcome after ischemic stroke. © 2017 American Heart Association, Inc.
Evaluation of the CEAS model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Stone, Wesley W.; Gilliom, Robert J.; Crawford, Charles G.
2008-01-01
Regression models were developed for predicting annual maximum and selected annual maximum moving-average concentrations of atrazine in streams using the Watershed Regressions for Pesticides (WARP) methodology developed by the National Water-Quality Assessment Program (NAWQA) of the U.S. Geological Survey (USGS). The current effort builds on the original WARP models, which were based on the annual mean and selected percentiles of the annual frequency distribution of atrazine concentrations. Estimates of annual maximum and annual maximum moving-average concentrations for selected durations are needed to characterize the levels of atrazine and other pesticides for comparison to specific water-quality benchmarks for evaluation of potential concerns regarding human health or aquatic life. Separate regression models were derived for the annual maximum and annual maximum 21-day, 60-day, and 90-day moving-average concentrations. Development of the regression models used the same explanatory variables, transformations, model development data, model validation data, and regression methods as those used in the original development of WARP. The models accounted for 72 to 75 percent of the variability in the concentration statistics among the 112 sampling sites used for model development. Predicted concentration statistics from the four models were within a factor of 10 of the observed concentration statistics for most of the model development and validation sites. Overall, performance of the models for the development and validation sites supports the application of the WARP models for predicting annual maximum and selected annual maximum moving-average atrazine concentration in streams and provides a framework to interpret the predictions in terms of uncertainty. For streams with inadequate direct measurements of atrazine concentrations, the WARP model predictions for the annual maximum and the annual maximum moving-average atrazine concentrations can be used to characterize the probable levels of atrazine for comparison to specific water-quality benchmarks. Sites with a high probability of exceeding a benchmark for human health or aquatic life can be prioritized for monitoring.
Topographic Metric Predictions of Soil redistribution and Organic Carbon Distribution in Croplands
NASA Astrophysics Data System (ADS)
Mccarty, G.; Li, X.
2017-12-01
Landscape topography is a key factor controlling soil redistribution and soil organic carbon (SOC) distribution in Iowa croplands (USA). In this study, we adopted a combined approach based on carbon () and cesium (137Cs) isotope tracers, and digital terrain analysis to understand patterns of SOC redistribution and carbon sequestration dynamics as influenced by landscape topography in tilled cropland under long term corn/soybean management. The fallout radionuclide 137Cs was used to estimate soil redistribution rates and a Lidar-derived DEM was used to obtain a set of topographic metrics for digital terrain analysis. Soil redistribution rates and patterns of SOC distribution were examined across 560 sampling locations at two field sites as well as at larger scale within the watershed. We used δ13C content in SOC to partition C3 and C4 plant derived C density at 127 locations in one of the two field sites with corn being the primary source of C4 C. Topography-based models were developed to simulate SOC distribution and soil redistribution using stepwise ordinary least square regression (SOLSR) and stepwise principal component regression (SPCR). All topography-based models developed through SPCR and SOLSR demonstrated good simulation performance, explaining more than 62% variability in SOC density and soil redistribution rates across two field sites with intensive samplings. However, the SOLSR models showed lower reliability than the SPCR models in predicting SOC density at the watershed scale. Spatial patterns of C3-derived SOC density were highly related to those of SOC density. Topographic metrics exerted substantial influence on C3-derived SOC density with the SPCR model accounting for 76.5% of the spatial variance. In contrast C4 derived SOC density had poor spatial structure likely reflecting the substantial contribution of corn vegetation to recently sequestered SOC density. Results of this study highlighted the utility of topographic SPCR models for scaling field measurements of SOC density and soil redistribution rates to watershed scale which will allow watershed model to better predict fate of ecosystem C on agricultural landscapes.
Voit, E O; Knapp, R G
1997-08-15
The linear-logistic regression model and Cox's proportional hazard model are widely used in epidemiology. Their successful application leaves no doubt that they are accurate reflections of observed disease processes and their associated risks or incidence rates. In spite of their prominence, it is not a priori evident why these models work. This article presents a derivation of the two models from the framework of canonical modeling. It begins with a general description of the dynamics between risk sources and disease development, formulates this description in the canonical representation of an S-system, and shows how the linear-logistic model and Cox's proportional hazard model follow naturally from this representation. The article interprets the model parameters in terms of epidemiological concepts as well as in terms of general systems theory and explains the assumptions and limitations generally accepted in the application of these epidemiological models.
Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
Prediction model for the return to work of workers with injuries in Hong Kong.
Xu, Yanwen; Chan, Chetwyn C H; Lo, Karen Hui Yu-Ling; Tang, Dan
2008-01-01
This study attempts to formulate a prediction model of return to work for a group of workers who have been suffering from chronic pain and physical injury while also being out of work in Hong Kong. The study used Case-based Reasoning (CBR) method, and compared the result with the statistical method of logistic regression model. The database of the algorithm of CBR was composed of 67 cases who were also used in the logistic regression model. The testing cases were 32 participants who had a similar background and characteristics to those in the database. The methods of setting constraints and Euclidean distance metric were used in CBR to search the closest cases to the trial case based on the matrix. The usefulness of the algorithm was tested on 32 new participants, and the accuracy of predicting return to work outcomes was 62.5%, which was no better than the 71.2% accuracy derived from the logistic regression model. The results of the study would enable us to have a better understanding of the CBR applied in the field of occupational rehabilitation by comparing with the conventional regression analysis. The findings would also shed light on the development of relevant interventions for the return-to-work process of these workers.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
NASA Astrophysics Data System (ADS)
Park, Kyungjeen
This study aims to develop an objective hurricane initialization scheme which incorporates not only forecast model constraints but also observed features such as the initial intensity and size. It is based on the four-dimensional variational (4D-Var) bogus data assimilation (BDA) scheme originally proposed by Zou and Xiao (1999). The 4D-Var BDA consists of two steps: (i) specifying a bogus sea level pressure (SLP) field based on parameters observed by the Tropical Prediction Center (TPC) and (ii) assimilating the bogus SLP field under a forecast model constraint to adjust all model variables. This research focuses on improving the specification of the bogus SLP indicated in the first step. Numerical experiments are carried out for Hurricane Bonnie (1998) and Hurricane Gordon (2000) to test the sensitivity of hurricane track and intensity forecasts to specification of initial vortex. Major results are listed below: (1) A linear regression model is developed for determining the size of initial vortex based on the TPC observed radius of 34kt. (2) A method is proposed to derive a radial profile of SLP from QuikSCAT surface winds. This profile is shown to be more realistic than ideal profiles derived from Fujita's and Holland's formulae. (3) It is found that it takes about 1 h for hurricane prediction model to develop a conceptually correct hurricane structure, featuring a dominant role of hydrostatic balance at the initial time and a dynamic adjustment in less than 30 minutes. (4) Numerical experiments suggest that track prediction is less sensitive to the specification of initial vortex structure than intensity forecast. (5) Hurricane initialization using QuikSCAT-derived initial vortex produced a reasonably good forecast for hurricane landfall, with a position error of 25 km and a 4-h delay at landfalling. (6) Numerical experiments using the linear regression model for the size specification considerably outperforms all the other formulations tested in terms of the intensity prediction for both Hurricanes. For examples, the maximum track error is less than 110 km during the entire three-day forecasts for both hurricanes. The simulated Hurricane Gordon using the linear regression model made a nearly perfect landfall, with no position error and only 1-h error in landfalling time. (7) Diagnosis of model output indicates that the initial vortex specified by the linear regression model produces larger surface fluxes of sensible heat, latent heat and moisture, as well as stronger downward angular momentum transport than all the other schemes do. These enhanced energy supplies offset the energy lost caused by friction and gravity wave propagation, allowing for the model to maintain a strong and realistic hurricane during the entire forward model integration.
NASA Astrophysics Data System (ADS)
Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.
2018-03-01
Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.
NASA Astrophysics Data System (ADS)
Walawender, Ewelina; Walawender, Jakub P.; Ustrnul, Zbigniew
2017-02-01
The main purpose of the study is to introduce methods for mapping the spatial distribution of the occurrence of selected atmospheric phenomena (thunderstorms, fog, glaze and rime) over Poland from 1966 to 2010 (45 years). Limited in situ observations as well the discontinuous and location-dependent nature of these phenomena make traditional interpolation inappropriate. Spatially continuous maps were created with the use of geospatial predictive modelling techniques. For each given phenomenon, an algorithm identifying its favourable meteorological and environmental conditions was created on the basis of observations recorded at 61 weather stations in Poland. Annual frequency maps presenting the probability of a day with a thunderstorm, fog, glaze or rime were created with the use of a modelled, gridded dataset by implementing predefined algorithms. Relevant explanatory variables were derived from NCEP/NCAR reanalysis and downscaled with the use of a Regional Climate Model. The resulting maps of favourable meteorological conditions were found to be valuable and representative on the country scale but at different correlation ( r) strength against in situ data (from r = 0.84 for thunderstorms to r = 0.15 for fog). A weak correlation between gridded estimates of fog occurrence and observations data indicated the very local nature of this phenomenon. For this reason, additional environmental predictors of fog occurrence were also examined. Topographic parameters derived from the SRTM elevation model and reclassified CORINE Land Cover data were used as the external, explanatory variables for the multiple linear regression kriging used to obtain the final map. The regression model explained 89 % of annual frequency of fog variability in the study area. Regression residuals were interpolated via simple kriging.
NASA Astrophysics Data System (ADS)
Carisi, Francesca; Domeneghetti, Alessio; Kreibich, Heidi; Schröter, Kai; Castellarin, Attilio
2017-04-01
Flood risk is function of flood hazard and vulnerability, therefore its accurate assessment depends on a reliable quantification of both factors. The scientific literature proposes a number of objective and reliable methods for assessing flood hazard, yet it highlights a limited understanding of the fundamental damage processes. Loss modelling is associated with large uncertainty which is, among other factors, due to a lack of standard procedures; for instance, flood losses are often estimated based on damage models derived in completely different contexts (i.e. different countries or geographical regions) without checking its applicability, or by considering only one explanatory variable (i.e. typically water depth). We consider the Secchia river flood event of January 2014, when a sudden levee-breach caused the inundation of nearly 200 km2 in Northern Italy. In the aftermath of this event, local authorities collected flood loss data, together with additional information on affected private households and industrial activities (e.g. buildings surface and economic value, number of company's employees and others). Based on these data we implemented and compared a quadratic-regression damage function, with water depth as the only explanatory variable, and a multi-variable model that combines multiple regression trees and considers several explanatory variables (i.e. bagging decision trees). Our results show the importance of data collection revealing that (1) a simple quadratic regression damage function based on empirical data from the study area can be significantly more accurate than literature damage-models derived for a different context and (2) multi-variable modelling may outperform the uni-variable approach, yet it is more difficult to develop and apply due to a much higher demand of detailed data.
Crop weather models of barley and spring wheat yield for agrophysical units in North Dakota
NASA Technical Reports Server (NTRS)
Leduc, S. (Principal Investigator)
1982-01-01
Models based on multiple regression were developed to estimate barley yield and spring wheat yield from weather data for Agrophysical units(APU) in North Dakota. The predictor variables are derived from monthly average temperature and monthly total precipitation data at meteorological stations in the cooperative network. The models are similar in form to the previous models developed for Crop Reporting Districts (CRD). The trends and derived variables were the same and the approach to select the significant predictors was similar to that used in developing the CRD models. The APU models show sight improvements in some of the statistics of the models, e.g., explained variation. These models are to be independently evaluated and compared to the previously evaluated CRD models. The comparison will indicate the preferred model area for this application, i.e., APU or CRD.
Lateral stability analysis for X-29A drop model using system identification methodology
NASA Technical Reports Server (NTRS)
Raney, David L.; Batterson, James G.
1989-01-01
A 22-percent dynamically scaled replica of the X-29A forward-swept-wing airplane has been flown in radio-controlled drop tests at the NASA Langley Research Center. A system identification study of the recorded data was undertaken to examine the stability and control derivatives that influence the lateral behavior of this vehicle with particular emphasis on an observed wing rock phenomenon. All major lateral stability derivatives and the damping-in-roll derivative were identified for angles of attack from 5 to 80 degrees by using a data-partitioning methodology and a modified stepwise regression algorithm.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Delwiche, Stephen R; Reeves, James B
2010-01-01
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various types of spectroscopy data.
Hong, Xia
2006-07-01
In this letter, a Box-Cox transformation-based radial basis function (RBF) neural network is introduced using the RBF neural network to represent the transformed system output. Initially a fixed and moderate sized RBF model base is derived based on a rank revealing orthogonal matrix triangularization (QR decomposition). Then a new fast identification algorithm is introduced using Gauss-Newton algorithm to derive the required Box-Cox transformation, based on a maximum likelihood estimator. The main contribution of this letter is to explore the special structure of the proposed RBF neural network for computational efficiency by utilizing the inverse of matrix block decomposition lemma. Finally, the Box-Cox transformation-based RBF neural network, with good generalization and sparsity, is identified based on the derived optimal Box-Cox transformation and a D-optimality-based orthogonal forward regression algorithm. The proposed algorithm and its efficacy are demonstrated with an illustrative example in comparison with support vector machine regression.
Mathematical modeling of tetrahydroimidazole benzodiazepine-1-one derivatives as an anti HIV agent
NASA Astrophysics Data System (ADS)
Ojha, Lokendra Kumar
2017-07-01
The goal of the present work is the study of drug receptor interaction via QSAR (Quantitative Structure-Activity Relationship) analysis for 89 set of TIBO (Tetrahydroimidazole Benzodiazepine-1-one) derivatives. MLR (Multiple Linear Regression) method is utilized to generate predictive models of quantitative structure-activity relationships between a set of molecular descriptors and biological activity (IC50). The best QSAR model was selected having a correlation coefficient (r) of 0.9299 and Standard Error of Estimation (SEE) of 0.5022, Fisher Ratio (F) of 159.822 and Quality factor (Q) of 1.852. This model is statistically significant and strongly favours the substitution of sulphur atom, IS i.e. indicator parameter for -Z position of the TIBO derivatives. Two other parameter logP (octanol-water partition coefficient) and SAG (Surface Area Grid) also played a vital role in the generation of best QSAR model. All three descriptor shows very good stability towards data variation in leave-one-out (LOO).
Brazil wheat yield covariance model
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate wheat yields for the wheat growing states of Rio Grande do Sul, Parana, and Santa Catarina in Brazil. The meteorological data of these three states were pooled and the years 1972 to 1979 were used to develop the model since there was no technological trend in the yields during these years. Predictor variables were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate soybean yields for the country of Argentina. A meteorological data set was obtained for the country by averaging data for stations within the soybean growing area. Predictor variables for the model were derived from monthly total precipitation and monthly average temperature. A trend variable was included for the years 1969 to 1978 since an increasing trend in yields due to technology was observed between these years.
Asymmetric dimethylarginine, related arginine derivatives, and incident atrial fibrillation.
Schnabel, Renate B; Maas, Renke; Wang, Na; Yin, Xiaoyan; Larson, Martin G; Levy, Daniel; Ellinor, Patrick T; Lubitz, Steven A; McManus, David D; Magnani, Jared W; Atzler, Dorothee; Böger, Rainer H; Schwedhelm, Edzard; Vasan, Ramachandran S; Benjamin, Emelia J
2016-06-01
Oxidative stress plays an important role in the development of atrial fibrillation (AF). Arginine derivatives including asymmetric dimethylarginine (ADMA) are central to nitric oxide metabolism and nitrosative stress. Whether blood concentrations of arginine derivatives are related to incidence of AF is uncertain. In 3,310 individuals (mean age 58 ± 10 years, 54% women) from the community-based Framingham Study, we prospectively examined the relations of circulating levels of ADMA, l-arginine, symmetric dimethylarginine (SDMA), and the ratio of l-arginine/ADMA to incidence of AF using proportional hazards regression models. Over a median follow-up time of 10 years, 247 AF cases occurred. Using age- and sex-adjusted regression models, ADMA was associated with a hazard ratio of 1.15 per 1-SD increase in loge-biomarker concentration (95% CI 1.02-1.29, P = .02) for AF, which was no longer significant after further risk factor adjustment (hazard ratio 1.09, 95% CI 0.97-1.23, P = .15). Neither l-arginine nor SDMA was related to new-onset AF. A clinical model comprising clinical risk factors for AF (for age, sex, height, weight, systolic blood pressure, diastolic blood pressure, current smoking, diabetes, hypertension treatment, myocardial infarction, and heart failure; c statistic = 0.781; 95% CI 0.753-0.808) was not improved by the addition of ADMA (0.782; 95% CI 0.755-0.809). Asymmetric dimethylarginine and related arginine derivatives were not associated with incident AF in the community after accounting for other clinical risk factors and confounders. Its role in the pathogenesis of AF needs further refinement. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ghavami, Raouf; Sadeghi, Faridoon; Rasouli, Zolikha; Djannati, Farhad
2012-12-01
Experimental values for the 13C NMR chemical shifts (ppm, TMS = 0) at 300 K ranging from 96.28 ppm (C4' of indole derivative 17) to 159.93 ppm (C4' of indole derivative 23) relative to deuteride chloroform (CDCl3, 77.0 ppm) or dimethylsulfoxide (DMSO, 39.50 ppm) as internal reference in CDCl3 or DMSO-d6 solutions have been collected from literature for thirty 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indole derivatives containing different substituted groups. An effective quantitative structure-property relationship (QSPR) models were built using hybrid method combining genetic algorithm (GA) based on stepwise selection multiple linear regression (SWS-MLR) as feature-selection tools and correlation models between each carbon atom of indole derivative and calculated descriptors. Each compound was depicted by molecular structural descriptors that encode constitutional, topological, geometrical, electrostatic, and quantum chemical features. The accuracy of all developed models were confirmed using different types of internal and external procedures and various statistical tests. Furthermore, the domain of applicability for each model which indicates the area of reliable predictions was defined.
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Estimation of stature from sternum - Exploring the quadratic models.
Saraf, Ashish; Kanchan, Tanuj; Krishan, Kewal; Ateriya, Navneet; Setia, Puneet
2018-04-14
Identification of the dead is significant in examination of unknown, decomposed and mutilated human remains. Establishing the biological profile is the central issue in such a scenario, and stature estimation remains one of the important criteria in this regard. The present study was undertaken to estimate stature from different parts of the sternum. A sample of 100 sterna was obtained from individuals during the medicolegal autopsies. Length of the deceased and various measurements of the sternum were measured. Student's t-test was performed to find the sex differences in stature and sternal measurements included in the study. Correlation between stature and sternal measurements were analysed using Karl Pearson's correlation, and linear and quadratic regression models were derived. All the measurements were found to be significantly larger in males than females. Stature correlated best with the combined length of sternum, among males (R = 0.894), females (R = 0.859), and for the total sample (R = 0.891). The study showed that the models derived for stature estimation from combined length of sternum are likely to give the most accurate estimates of stature in forensic case work when compared to manubrium and mesosternum. Accuracy of stature estimation further increased with quadratic models derived for the mesosternum among males and combined length of sternum among males and females when compared to linear regression models. Future studies in different geographical locations and a larger sample size are proposed to confirm the study observations. Copyright © 2018 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
ERIC Educational Resources Information Center
Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul
2012-01-01
The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
NASA Astrophysics Data System (ADS)
Anand, Jasdeep S.; Monks, Paul S.
2017-07-01
Land use regression (LUR) models have been used in epidemiology to determine the fine-scale spatial variation in air pollutants such as nitrogen dioxide (NO2) in cities and larger regions. However, they are often limited in their temporal resolution, which may potentially be rectified by employing the synoptic coverage provided by satellite measurements. In this work a mixed-effects LUR model is developed to model daily surface NO2 concentrations over the Hong Kong SAR during the period 2005-2015. In situ measurements from the Hong Kong Air Quality Monitoring Network, along with tropospheric vertical column density (VCD) data from the OMI, GOME-2A, and SCIAMACHY satellite instruments were combined with fine-scale land use parameters to provide the spatiotemporal information necessary to predict daily surface concentrations. Cross-validation with the in situ data shows that the mixed-effects LUR model using OMI data has a high predictive power (adj. R2 = 0. 84), especially when compared with surface concentrations derived using the MACC-II reanalysis model dataset (adj. R2 = 0. 11). Time series analysis shows no statistically significant trend in NO2 concentrations during 2005-2015, despite a reported decline in NOx emissions. This study demonstrates the utility in combining satellite data with LUR models to derive daily maps of ambient surface NO2 for use in exposure studies.
Zhang, Jingyi; Li, Bin; Chen, Yumin; Chen, Meijie; Fang, Tao; Liu, Yongfeng
2018-06-11
This paper proposes a regression model using the Eigenvector Spatial Filtering (ESF) method to estimate ground PM 2.5 concentrations. Covariates are derived from remotely sensed data including aerosol optical depth, normal differential vegetation index, surface temperature, air pressure, relative humidity, height of planetary boundary layer and digital elevation model. In addition, cultural variables such as factory densities and road densities are also used in the model. With the Yangtze River Delta region as the study area, we constructed ESF-based Regression (ESFR) models at different time scales, using data for the period between December 2015 and November 2016. We found that the ESFR models effectively filtered spatial autocorrelation in the OLS residuals and resulted in increases in the goodness-of-fit metrics as well as reductions in residual standard errors and cross-validation errors, compared to the classic OLS models. The annual ESFR model explained 70% of the variability in PM 2.5 concentrations, 16.7% more than the non-spatial OLS model. With the ESFR models, we performed detail analyses on the spatial and temporal distributions of PM 2.5 concentrations in the study area. The model predictions are lower than ground observations but match the general trend. The experiment shows that ESFR provides a promising approach to PM 2.5 analysis and prediction.
NASA Astrophysics Data System (ADS)
Kawano, N.; Varquez, A. C. G.; Dong, Y.; Kanda, M.
2016-12-01
Numerical model such as Weather Research and Forecasting model coupled with single-layer Urban Canopy Model (WRF-UCM) is one of the powerful tools to investigate urban heat island. Urban parameters such as average building height (Have), plain area index (λp) and frontal area index (λf), are necessary inputs for the model. In general, these parameters are uniformly assumed in WRF-UCM but this leads to unrealistic urban representation. Distributed urban parameters can also be incorporated into WRF-UCM to consider a detail urban effect. The problem is that distributed building information is not readily available for most megacities especially in developing countries. Furthermore, acquiring real building parameters often require huge amount of time and money. In this study, we investigated the potential of using globally available satellite-captured datasets for the estimation of the parameters, Have, λp, and λf. Global datasets comprised of high spatial resolution population dataset (LandScan by Oak Ridge National Laboratory), nighttime lights (NOAA), and vegetation fraction (NASA). True samples of Have, λp, and λf were acquired from actual building footprints from satellite images and 3D building database of Tokyo, New York, Paris, Melbourne, Istanbul, Jakarta and so on. Regression equations were then derived from the block-averaging of spatial pairs of real parameters and global datasets. Results show that two regression curves to estimate Have and λf from the combination of population and nightlight are necessary depending on the city's level of development. An index which can be used to decide which equation to use for a city is the Gross Domestic Product (GDP). On the other hand, λphas less dependence on GDP but indicated a negative relationship to vegetation fraction. Finally, a simplified but precise approximation of urban parameters through readily-available, high-resolution global datasets and our derived regressions can be utilized to estimate a global distribution of urban parameters for later incorporation into a weather model, thus allowing us to acquire a global understanding of urban climate (Global Urban Climatology). Acknowledgment: This research was supported by the Environment Research and Technology Development Fund (S-14) of the Ministry of the Environment, Japan.
Quantile Regression Models for Current Status Data
Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen
2016-01-01
Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
NASA Astrophysics Data System (ADS)
Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna
2018-04-01
Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.
An Examination of Master's Student Retention & Completion
ERIC Educational Resources Information Center
Barry, Melissa; Mathies, Charles
2011-01-01
This study was conducted at a research-extensive public university in the southeastern United States. It examined the retention and completion of master's degree students across numerous disciplines. Results were derived from a series of descriptive statistics, T-tests, and a series of binary logistic regression models. The findings from binary…
Changes in Clavicle Length and Maturation in Americans: 1840-1980.
Langley, Natalie R; Cridlin, Sandra
2016-01-01
Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
[Exploring novel hyperspectral band and key index for leaf nitrogen accumulation in wheat].
Yao, Xia; Zhu, Yan; Feng, Wei; Tian, Yong-Chao; Cao, Wei-Xing
2009-08-01
The objectives of the present study were to explore new sensitive spectral bands and ratio spectral indices based on precise analysis of ground-based hyperspectral information, and then develop regression model for estimating leaf N accumulation per unit soil area (LNA) in winter wheat (Triticum aestivum L.). Three field experiments were conducted with different N rates and cultivar types in three consecutive growing seasons, and time-course measurements were taken on canopy hyperspectral reflectance and LNA tinder the various treatments. By adopting the method of reduced precise sampling, the detailed ratio spectral indices (RSI) within the range of 350-2 500 nm were constructed, and the quantitative relationships between LNA (gN m(-2)) and RSI (i, j) were analyzed. It was found that several key spectral bands and spectral indices were suitable for estimating LNA in wheat, and the spectral parameter RSI (990, 720) was the most reliable indicator for LNA in wheat. The regression model based on the best RSI was formulated as y = 5.095x - 6.040, with R2 of 0.814. From testing of the derived equations with independent experiment data, the model on RSI (990, 720) had R2 of 0.847 and RRMSE of 24.7%. Thus, it is concluded that the present hyperspectral parameter of RSI (990, 720) and derived regression model can be reliably used for estimating LNA in winter wheat. These results provide the feasible key bands and technical basis for developing the portable instrument of monitoring wheat nitrogen status and for extracting useful spectral information from remote sensing images.
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.
Pot, Gerda K; Stephen, Alison M; Dahm, Christina C; Key, Timothy J; Cairns, Benjamin J; Burley, Victoria J; Cade, Janet E; Greenwood, Darren C; Keogh, Ruth H; Bhaniani, Amit; McTaggart, Alison; Lentjes, Marleen AH; Mishra, Gita; Brunner, Eric J; Khaw, Kay Tee
2015-01-01
Background/ Objectives In spite of several studies relating dietary patterns to breast cancer risk, evidence so far remains inconsistent. This study aimed to investigate associations of dietary patterns derived with three different methods with breast cancer risk. Subjects/ Methods The Mediterranean Diet Score (MDS), principal components analyses (PCA) and reduced rank regression (RRR) were used to derive dietary patterns in a case-control study of 610 breast cancer cases and 1891 matched controls within 4 UK cohort studies. Dietary intakes were collected prospectively using 4-to 7-day food diaries and resulting food consumption data were grouped into 42 food groups. Conditional logistic regression models were used to estimate odds ratios (ORs) for associations between pattern scores and breast cancer risk adjusting for relevant covariates. A separate model was fitted for post-menopausal women only. Results The MDS was not associated with breast cancer risk (OR comparing 1st tertile with 3rd 1.20 (95% CI 0.92; 1.56)), nor the first PCA-derived dietary pattern, explaining 2.7% of variation of diet and characterized by cheese, crisps and savoury snacks, legumes, nuts and seeds (OR 1.18 (95% CI 0.91; 1.53)). The first RRR-derived pattern, a ‘high-alcohol’ pattern, was associated with a higher risk of breast cancer (OR 1.27; 95% CI 1.00; 1.62), which was most pronounced in post-menopausal women (OR 1.46 (95% CI 1.08; 1.98). Conclusions A ‘high-alcohol’ dietary pattern derived with RRR was associated with an increased breast cancer risk; no evidence of associations of other dietary patterns with breast cancer risk was observed in this study. PMID:25052230
Enhancing the Bioconversion of Azelaic Acid to Its Derivatives by Response Surface Methodology.
Khairudin, Nurshafira; Basri, Mahiran; Fard Masoumi, Hamid Reza; Samson, Shazwani; Ashari, Siti Efliza
2018-02-13
Azelaic acid (AzA) and its derivatives have been known to be effective in the treatment of acne and various cutaneous hyperpigmentary disorders. The esterification of azelaic acid with lauryl alcohol (LA) to produce dilaurylazelate using immobilized lipase B from Candida antarctica (Novozym 435) is reported. Response surface methodology was selected to optimize the reaction conditions. A well-fitting quadratic polynomial regression model for the acid conversion was established with regards to several parameters, including reaction time and temperature, enzyme amount, and substrate molar ratios. The regression equation obtained by the central composite design of RSM predicted that the optimal reaction conditions included a reaction time of 360 min, 0.14 g of enzyme, a reaction temperature of 46 °C, and a molar ratio of substrates of 1:4.1. The results from the model were in good agreement with the experimental data and were within the experimental range (R² of 0.9732).The inhibition zone can be seen at dilaurylazelate ester with diameter 9.0±0.1 mm activities against Staphylococcus epidermidis S273. The normal fibroblasts cell line (3T3) was used to assess the cytotoxicity activity of AzA and AzA derivative, which is dilaurylazelate ester. The comparison of the IC 50 (50% inhibition of cell viability) value for AzA and AzA derivative was demonstrated. The IC 50 value for AzA was 85.28 μg/mL, whereas the IC 50 value for AzA derivative was more than 100 μg/mL. The 3T3 cell was still able to survive without any sign of toxicity from the AzA derivative; thus, it was proven to be non-toxic in this MTT assay when compared with AzA.
NASA Astrophysics Data System (ADS)
Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang
2018-03-01
As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.
Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K
2005-01-01
Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste.
Uhrich, Mark A.; Kolasinac, Jasna; Booth, Pamela L.; Fountain, Robert L.; Spicer, Kurt R.; Mosbrucker, Adam R.
2014-01-01
Researchers at the U.S. Geological Survey, Cascades Volcano Observatory, investigated alternative methods for the traditional sample-based sediment record procedure in determining suspended-sediment concentration (SSC) and discharge. One such sediment-surrogate technique was developed using turbidity and discharge to estimate SSC for two gaging stations in the Toutle River Basin near Mount St. Helens, Washington. To provide context for the study, methods for collecting sediment data and monitoring turbidity are discussed. Statistical methods used include the development of ordinary least squares regression models for each gaging station. Issues of time-related autocorrelation also are evaluated. Addition of lagged explanatory variables was used to account for autocorrelation in the turbidity, discharge, and SSC data. Final regression model equations and plots are presented for the two gaging stations. The regression models support near-real-time estimates of SSC and improved suspended-sediment discharge records by incorporating continuous instream turbidity. Future use of such models may potentially lower the costs of sediment monitoring by reducing time it takes to collect and process samples and to derive a sediment-discharge record.
NASA Technical Reports Server (NTRS)
Mcgwire, K.; Friedl, M.; Estes, J. E.
1993-01-01
This article describes research related to sampling techniques for establishing linear relations between land surface parameters and remotely-sensed data. Predictive relations are estimated between percentage tree cover in a savanna environment and a normalized difference vegetation index (NDVI) derived from the Thematic Mapper sensor. Spatial autocorrelation in original measurements and regression residuals is examined using semi-variogram analysis at several spatial resolutions. Sampling schemes are then tested to examine the effects of autocorrelation on predictive linear models in cases of small sample sizes. Regression models between image and ground data are affected by the spatial resolution of analysis. Reducing the influence of spatial autocorrelation by enforcing minimum distances between samples may also improve empirical models which relate ground parameters to satellite data.
Delgado, Alfredo; Hays, Dirk B; Bruton, Richard K; Ceballos, Hernán; Novo, Alexandre; Boi, Enrico; Selvaraj, Michael Gomez
2017-01-01
Understanding root traits is a necessary research front for selection of favorable genotypes or cultivation practices. Root and tuber crops having most of their economic potential stored below ground are favorable candidates for such studies. The ability to image and quantify subsurface root structure would allow breeders to classify root traits for rapid selection and allow agronomist the ability to derive effective cultivation practices. In spite of the huge role of Cassava ( Manihot esculenta Crantz), for food security and industrial uses, little progress has been made in understanding the onset and rate of the root-bulking process and the factors that influence it. The objective of this research was to determine the capability of ground penetrating radar (GPR) to predict root-bulking rates through the detection of total root biomass during its growth cycle. Our research provides the first application of GPR for detecting below ground biomass in cassava. Through an empirical study, linear regressions were derived to model cassava bulking rates. The linear equations derived suggest that GPR is a suitable measure of root biomass ( r = .79). The regression analysis developed accounts for 63% of the variability in cassava biomass below ground. When modeling is performed at the variety level, it is evident that the variety models for SM 1219-9 and TMS 60444 outperform the HMC-1 variety model (r 2 = .77, .63 and .51 respectively). Using current modeling methods, it is possible to predict below ground biomass and estimate root bulking rates for selection of early root bulking in cassava. Results of this approach suggested that the general model was over predicting at early growth stages but became more precise in later root development.
Bonsel, Gouke J.
2016-01-01
Background Intersectoral perspectives of health are present in the rhetoric of the sustainable development goals. Yet its descriptions of systematic approaches for an intersectoral monitoring vision, joining determinants of health, and barriers or facilitators to accessing healthcare services are lacking. Objective To explore models of associations between health outcomes and health service coverage, and health determinants and health systems responsiveness, and thereby to contribute to monitoring, analysis, and assessment approaches informed by an intersectoral vision of health. Design The study is designed as a series of ecological, cross-country regression analyses, covering between 23 and 57 countries with dependent health variables concentrated on the years 2002–2003. Countries cover a range of development contexts. Health outcome and health service coverage dependent variables were derived from World Health Organization (WHO) information sources. Predictor variables representing determinants are derived from the WHO and World Bank databases; variables used for health systems’ responsiveness are derived from the WHO World Health Survey. Responsiveness is a measure of acceptability of health services to the population, complementing financial health protection. Results Health determinants’ indicators – access to improved drinking sources, accountability, and average years of schooling – were statistically significant in particular health outcome regressions. Statistically significant coefficients were more common for mortality rate regressions than for coverage rate regressions. Responsiveness was systematically associated with poorer health and health service coverage. With respect to levels of inequality in health, the indicator of responsiveness problems experienced by the unhealthy poor groups in the population was statistically significant for regressions on measles vaccination inequalities between rich and poor. For the broader determinants, the Gini mattered most for inequalities in child mortality; education mattered more for inequalities in births attended by skilled personnel. Conclusions This paper adds to the literature on comparative health systems research. National and international health monitoring frameworks need to incorporate indicators on trends in and impacts of other policy sectors on health. This will empower the health sector to carry out public health practices that promote health and health equity. PMID:26942516
Valentine, Nicole Britt; Bonsel, Gouke J
2016-01-01
Intersectoral perspectives of health are present in the rhetoric of the sustainable development goals. Yet its descriptions of systematic approaches for an intersectoral monitoring vision, joining determinants of health, and barriers or facilitators to accessing healthcare services are lacking. To explore models of associations between health outcomes and health service coverage, and health determinants and health systems responsiveness, and thereby to contribute to monitoring, analysis, and assessment approaches informed by an intersectoral vision of health. The study is designed as a series of ecological, cross-country regression analyses, covering between 23 and 57 countries with dependent health variables concentrated on the years 2002-2003. Countries cover a range of development contexts. Health outcome and health service coverage dependent variables were derived from World Health Organization (WHO) information sources. Predictor variables representing determinants are derived from the WHO and World Bank databases; variables used for health systems' responsiveness are derived from the WHO World Health Survey. Responsiveness is a measure of acceptability of health services to the population, complementing financial health protection. Health determinants' indicators - access to improved drinking sources, accountability, and average years of schooling - were statistically significant in particular health outcome regressions. Statistically significant coefficients were more common for mortality rate regressions than for coverage rate regressions. Responsiveness was systematically associated with poorer health and health service coverage. With respect to levels of inequality in health, the indicator of responsiveness problems experienced by the unhealthy poor groups in the population was statistically significant for regressions on measles vaccination inequalities between rich and poor. For the broader determinants, the Gini mattered most for inequalities in child mortality; education mattered more for inequalities in births attended by skilled personnel. This paper adds to the literature on comparative health systems research. National and international health monitoring frameworks need to incorporate indicators on trends in and impacts of other policy sectors on health. This will empower the health sector to carry out public health practices that promote health and health equity.
Schuller, Alwin G; Barry, Evan R; Jones, Rhys D O; Henry, Ryan E; Frigault, Melanie M; Beran, Garry; Linsenmayer, David; Hattersley, Maureen; Smith, Aaron; Wilson, Joanne; Cairo, Stefano; Déas, Olivier; Nicolle, Delphine; Adam, Ammar; Zinda, Michael; Reimer, Corinne; Fawell, Stephen E; Clark, Edwin A; D'Cruz, Celina M
2015-06-15
Papillary renal cell carcinoma (PRCC) is the second most common cancer of the kidney and carries a poor prognosis for patients with nonlocalized disease. The HGF receptor MET plays a central role in PRCC and aberrations, either through mutation, copy number gain, or trisomy of chromosome 7 occurring in the majority of cases. The development of effective therapies in PRCC has been hampered in part by a lack of available preclinical models. We determined the pharmacodynamic and antitumor response of the selective MET inhibitor AZD6094 in two PRCC patient-derived xenograft (PDX) models. Two PRCC PDX models were identified and MET mutation status and copy number determined. Pharmacodynamic and antitumor activity of AZD6094 was tested using a dose response up to 25 mg/kg daily, representing clinically achievable exposures, and compared with the activity of the RCC standard-of-care sunitinib (in RCC43b) or the multikinase inhibitor crizotinib (in RCC47). AZD6094 treatment resulted in tumor regressions, whereas sunitinib or crizotinib resulted in unsustained growth inhibition. Pharmacodynamic analysis of tumors revealed that AZD6094 could robustly suppress pMET and the duration of target inhibition was dose related. AZD6094 inhibited multiple signaling nodes, including MAPK, PI3K, and EGFR. Finally, at doses that induced tumor regression, AZD6094 resulted in a dose- and time-dependent induction of cleaved PARP, a marker of cell death. Data presented provide the first report testing therapeutics in preclinical in vivo models of PRCC and support the clinical development of AZD6094 in this indication. ©2015 American Association for Cancer Research.
Spahr, Norman E.; Mueller, David K.; Wolock, David M.; Hitt, Kerie J.; Gronberg, JoAnn M.
2010-01-01
Data collected for the U.S. Geological Survey National Water-Quality Assessment program from 1992-2001 were used to investigate the relations between nutrient concentrations and nutrient sources, hydrology, and basin characteristics. Regression models were developed to estimate annual flow-weighted concentrations of total nitrogen and total phosphorus using explanatory variables derived from currently available national ancillary data. Different total-nitrogen regression models were used for agricultural (25 percent or more of basin area classified as agricultural land use) and nonagricultural basins. Atmospheric, fertilizer, and manure inputs of nitrogen, percent sand in soil, subsurface drainage, overland flow, mean annual precipitation, and percent undeveloped area were significant variables in the agricultural basin total nitrogen model. Significant explanatory variables in the nonagricultural total nitrogen model were total nonpoint-source nitrogen input (sum of nitrogen from manure, fertilizer, and atmospheric deposition), population density, mean annual runoff, and percent base flow. The concentrations of nutrients derived from regression (CONDOR) models were applied to drainage basins associated with the U.S. Environmental Protection Agency (USEPA) River Reach File (RF1) to predict flow-weighted mean annual total nitrogen concentrations for the conterminous United States. The majority of stream miles in the Nation have predicted concentrations less than 5 milligrams per liter. Concentrations greater than 5 milligrams per liter were predicted for a broad area extending from Ohio to eastern Nebraska, areas spatially associated with greater application of fertilizer and manure. Probabilities that mean annual total-nitrogen concentrations exceed the USEPA regional nutrient criteria were determined by incorporating model prediction uncertainty. In all nutrient regions where criteria have been established, there is at least a 50 percent probability of exceeding the criteria in more than half of the stream miles. Dividing calibration sites into agricultural and nonagricultural groups did not improve the explanatory capability for total phosphorus models. The group of explanatory variables that yielded the lowest model error for mean annual total phosphorus concentrations includes phosphorus input from manure, population density, amounts of range land and forest land, percent sand in soil, and percent base flow. However, the large unexplained variability and associated model error precluded the use of the total phosphorus model for nationwide extrapolations.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
Five models based on multiple regression were developed to estimate wheat yields for the five wheat growing provinces of Argentina. Meteorological data sets were obtained for each province by averaging data for stations within each province. Predictor variables for the models were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature. Buenos Aires was the only province for which a trend variable was included because of increasing trend in yield due to technology from 1950 to 1963.
NASA Technical Reports Server (NTRS)
Callis, S. L.; Sakamoto, C.
1984-01-01
A model based on multiple regression was developed to estimate corn yields for the country of Argentina. A meteorological data set was obtained for the country by averaging data for stations within the corn-growing area. Predictor variables for the model were derived from monthly total precipitation, average monthly mean temperature, and average monthly maximum temperature. A trend variable was included for the years 1965 to 1980 since an increasing trend in yields due to technology was observed between these years.
Probabilistic Estimates of Global Mean Sea Level and its Underlying Processes
NASA Astrophysics Data System (ADS)
Hay, C.; Morrow, E.; Kopp, R. E.; Mitrovica, J. X.
2015-12-01
Local sea level can vary significantly from the global mean value due to a suite of processes that includes ongoing sea-level changes due to the last ice age, land water storage, ocean circulation changes, and non-uniform sea-level changes that arise when modern-day land ice rapidly melts. Understanding these sources of spatial and temporal variability is critical to estimating past and present sea-level change and projecting future sea-level rise. Using two probabilistic techniques, a multi-model Kalman smoother and Gaussian process regression, we have reanalyzed 20th century tide gauge observations to produce a new estimate of global mean sea level (GMSL). Our methods allow us to extract global information from the sparse tide gauge field by taking advantage of the physics-based and model-derived geometry of the contributing processes. Both methods provide constraints on the sea-level contribution of glacial isostatic adjustment (GIA). The Kalman smoother tests multiple discrete models of glacial isostatic adjustment (GIA), probabilistically computing the most likely GIA model given the observations, while the Gaussian process regression characterizes the prior covariance structure of a suite of GIA models and then uses this structure to estimate the posterior distribution of local rates of GIA-induced sea-level change. We present the two methodologies, the model-derived geometries of the underlying processes, and our new probabilistic estimates of GMSL and GIA.
Estimation of Chinese surface NO2 concentrations combining satellite data and Land Use Regression
NASA Astrophysics Data System (ADS)
Anand, J.; Monks, P.
2016-12-01
Monitoring surface-level air quality is often limited by in-situ instrument placement and issues arising from harmonisation over long timescales. Satellite instruments can offer a synoptic view of regional pollution sources, but in many cases only a total or tropospheric column can be measured. In this work a new technique of estimating surface NO2 combining both satellite and in-situ data is presented, in which a Land Use Regression (LUR) model is used to create high resolution pollution maps based on known predictor variables such as population density, road networks, and land cover. By employing a mixed effects approach, it is possible to take advantage of the spatiotemporal variability in the satellite-derived column densities to account for daily and regional variations in surface NO2 caused by factors such as temperature, elevation, and wind advection. In this work, surface NO2 maps are modelled over the North China Plain and Pearl River Delta during high-pollution episodes by combining in-situ measurements and tropospheric columns from the Ozone Monitoring Instrument (OMI). The modelled concentrations show good agreement with in-situ data and surface NO2 concentrations derived from the MACC-II global reanalysis.
Hirve, Siddhivinayak; Vounatsou, Penelope; Juvekar, Sanjay; Blomstedt, Yulia; Wall, Stig; Chatterji, Somnath; Ng, Nawi
2014-03-01
We compared prevalence estimates of self-rated health (SRH) derived indirectly using four different small area estimation methods for the Vadu (small) area from the national Study on Global AGEing (SAGE) survey with estimates derived directly from the Vadu SAGE survey. The indirect synthetic estimate for Vadu was 24% whereas the model based estimates were 45.6% and 45.7% with smaller prediction errors and comparable to the direct survey estimate of 50%. The model based techniques were better suited to estimate the prevalence of SRH than the indirect synthetic method. We conclude that a simplified mixed effects regression model can produce valid small area estimates of SRH. © 2013 Published by Elsevier Ltd.
Bhargava, Dinesh; Karthikeyan, C; Moorthy, N S H N; Trivedi, Piyush
2009-09-01
QSAR study was carried out for a series of piperazinyl phenylalanine derivatives exhibiting VLA-4/VCAM-1 inhibitory activity to find out the structural features responsible for the biological activity. The QSAR study was carried out on V-life Molecular Design Suite software and the derived best QSAR model by partial least square (forward) regression method showed 85.67% variation in biological activity. The statistically significant model with high correlation coefficient (r2=0.85) was selected for further study and the resulted validation parameters of the model, crossed squared correlation coefficient (q2=0.76 and pred_r2=0.42) show the model has good predictive ability. The model showed that the parameters SaaNEindex, SsClcount slogP,and 4PathCount are highly correlated with VLA-4/VCAM-1 inhibitory activity of piperazinyl phenylalanine derivatives. The result of the study suggests that the chlorine atoms in the molecule and fourth order fragmentation patterns in the molecular skeleton favour VLA-4/VCAM-1 inhibition shown by the title compounds whereas lipophilicity and nitrogen bonded to aromatic bond are not conducive for VLA-4/VCAM-1 inhibitory activity.
Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy
NASA Astrophysics Data System (ADS)
Wang, Liang-Jie; Sawada, Kazuhide; Moriguchi, Shuji
2013-08-01
Several mathematical models are used to predict the spatial distribution characteristics of landslides to mitigate damage caused by landslide disasters. Although some studies have achieved excellent results around the world, few studies take the inter-relationship of the selected points (training points) into account. In this paper, we present the Fuzzy c-means (FCM) algorithm as an optimal method for choosing the appropriate input landslide points as training data. Based on different combinations of the Fuzzy exponent (m) and the number of clusters (c), five groups of sampling points were derived from formal seed cells points and applied to analyze the landslide susceptibility in Mizunami City, Gifu Prefecture, Japan. A logistic regression model is applied to create the models of the relationships between landslide-conditioning factors and landslide occurrence. The pre-existing landslide bodies and the area under the relative operative characteristic (ROC) curve were used to evaluate the performance of all the models with different m and c. The results revealed that Model no. 4 (m=1.9, c=4) and Model no. 5 (m=1.9, c=5) have significantly high classification accuracies, i.e., 90.0%. Moreover, over 30% of the landslide bodies were grouped under the very high susceptibility zone. Otherwise, Model no. 4 and Model no. 5 had higher area under the ROC curve (AUC) values, which were 0.78 and 0.79, respectively. Therefore, Model no. 4 and Model no. 5 offer better model results for landslide susceptibility mapping. Maps derived from Model no. 4 and Model no. 5 would offer the local authorities crucial information for city planning and development.
Barry T. Wilson; Joseph F. Knight; Ronald E. McRoberts
2018-01-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bender, Edward T.
Purpose: To derive a radiobiological model that enables the estimation of brain necrosis and spinal cord myelopathy rates for a variety of fractionation schemes, and to compare repair effects between brain and spinal cord. Methods: Sigmoidal dose response relationships for brain radiation necrosis and spinal cord myelopathy are derived from clinical data using nonlinear regression. Three different repair models are considered and the repair halftimes are included as regression parameters. Results: For radiation necrosis, a repair halftime of 38.1 (range 6.9-76) h is found with monoexponential repair, while for spinal cord myelopathy, a repair halftime of 4.1 (range 0-8) hmore » is found. The best-fit alpha beta ratio is 0.96 (range 0.24-1.73)Conclusions: A radiobiological model that includes repair corrections can describe the clinical data for a variety of fraction sizes, fractionation schedules, and total doses. Modeling suggests a relatively long repair halftime for brain necrosis. This study suggests that the repair halftime for late radiation effects in the brain may be longer than is currently thought. If confirmed in future studies, this may lead to a re-evaluation of radiation fractionation schedules for some CNS diseases, particularly for those diseases where fractionated stereotactic radiation therapy is used.« less
Conrad, Douglas J; Bailey, Barbara A; Hardie, Jon A; Bakke, Per S; Eagan, Tomas M L; Aarli, Bernt B
2017-01-01
Clinical phenotyping, therapeutic investigations as well as genomic, airway secretion metabolomic and metagenomic investigations can benefit from robust, nonlinear modeling of FEV1 in individual subjects. We demonstrate the utility of measuring FEV1 dynamics in representative cystic fibrosis (CF) and chronic obstructive pulmonary disease (COPD) populations. Individual FEV1 data from CF and COPD subjects were modeled by estimating median regression splines and their predicted first and second derivatives. Classes were created from variables that capture the dynamics of these curves in both cohorts. Nine FEV1 dynamic variables were identified from the splines and their predicted derivatives in individuals with CF (n = 177) and COPD (n = 374). Three FEV1 dynamic classes (i.e. stable, intermediate and hypervariable) were generated and described using these variables from both cohorts. In the CF cohort, the FEV1 hypervariable class (HV) was associated with a clinically unstable, female-dominated phenotypes while stable FEV1 class (S) individuals were highly associated with the male-dominated milder clinical phenotype. In the COPD cohort, associations were found between the FEV1 dynamic classes, the COPD GOLD grades, with exacerbation frequency and symptoms. Nonlinear modeling of FEV1 with splines provides new insights and is useful in characterizing CF and COPD clinical phenotypes.
NASA Astrophysics Data System (ADS)
Nijland, Wiebe; Nielsen, Scott E.; Coops, Nicholas C.; Wulder, Michael A.; Stenhouse, Gordon B.
2014-01-01
Food and habitat resources are critical components of wildlife management and conservation efforts. The grizzly bear (Ursus arctos) has diverse diets and habitat requirements particularly for understory plant species, which are impacted by human developments and forest management activities. We use light detection and ranging (LiDAR) data to predict the occurrence of 14 understory plant species relevant to bear forage and compare our predictions with more conventional climate- and land cover-based models. We use boosted regression trees to model each of the 14 understory species across 4435 km2 using occurrence (presence-absence) data from 1941 field plots. Three sets of models were fitted: climate only, climate and basic land and forest covers from Landsat 30-m imagery, and a climate- and LiDAR-derived model describing both the terrain and forest canopy. Resulting model accuracies varied widely among species. Overall, 8 of 14 species models were improved by including the LiDAR-derived variables. For climate-only models, mean annual precipitation and frost-free periods were the most important variables. With inclusion of LiDAR-derived attributes, depth-to-water table, terrain-intercepted annual radiation, and elevation were most often selected. This suggests that fine-scale terrain conditions affect the distribution of the studied species more than canopy conditions.
NASA Astrophysics Data System (ADS)
Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.
2017-12-01
Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.
Yadav, Dharmendra Kumar; Kalani, Komal; Singh, Abhishek K; Khan, Feroz; Srivastava, Santosh K; Pant, Aditya B
2014-01-01
In the present work, QSAR model was derived by multiple linear regression method for the prediction of anticancer activity of 18β-glycyrrhetinic acid derivatives against the human breast cancer cell line MCF-7. The QSAR model for anti-proliferative activity against MCF-7 showed high correlation (r(2)=0.90 and rCV(2)=0.83) and indicated that chemical descriptors namely, dipole moment (debye), steric energy (kcal/mole), heat of formation (kcal/mole), ionization potential (eV), LogP, LUMO energy (eV) and shape index (basic kappa, order 3) correlate well with activity. The QSAR virtually predicted that active derivatives were first semi-synthesized and characterized on the basis of their (1)H and (13)C NMR spectroscopic data and then were in-vitro tested against MCF-7 cancer cell line. In particular, octylamide derivative of glycyrrhetinic acid GA-12 has marked cytotoxic activity against MCF-7 similar to that of standard anticancer drug paclitaxel. The biological assays of active derivative selected by virtual screening showed significant experimental activity.
2015-01-01
The ecological significance of fish and squid of the mesopelagic zone (200 m–1000 m) is evident by their pervasiveness in the diets of a broad spectrum of upper pelagic predators including other fishes and squids, seabirds and marine mammals. As diel vertical migrators, mesopelagic micronekton are recognized as an important trophic link between the deep scattering layer and upper surface waters, yet fundamental aspects of the life history and energetic contribution to the food web for most are undescribed. Here, we present newly derived regression equations for 32 species of mesopelagic fish and squid based on the relationship between body size and the size of hard parts typically used to identify prey species in predator diet studies. We describe the proximate composition and energy density of 31 species collected in the eastern Bering Sea during May 1999 and 2000. Energy values are categorized by body size as a proxy for relative age and can be cross-referenced with the derived regression equations. Data are tabularized to facilitate direct application to predator diet studies and food web models. PMID:26287534
Sinclair, Elizabeth H; Walker, William A; Thomason, James R
2015-01-01
The ecological significance of fish and squid of the mesopelagic zone (200 m-1000 m) is evident by their pervasiveness in the diets of a broad spectrum of upper pelagic predators including other fishes and squids, seabirds and marine mammals. As diel vertical migrators, mesopelagic micronekton are recognized as an important trophic link between the deep scattering layer and upper surface waters, yet fundamental aspects of the life history and energetic contribution to the food web for most are undescribed. Here, we present newly derived regression equations for 32 species of mesopelagic fish and squid based on the relationship between body size and the size of hard parts typically used to identify prey species in predator diet studies. We describe the proximate composition and energy density of 31 species collected in the eastern Bering Sea during May 1999 and 2000. Energy values are categorized by body size as a proxy for relative age and can be cross-referenced with the derived regression equations. Data are tabularized to facilitate direct application to predator diet studies and food web models.
NASA Astrophysics Data System (ADS)
Di Giacomo, Domenico; Bondár, István; Storchak, Dmitry A.; Engdahl, E. Robert; Bormann, Peter; Harris, James
2015-02-01
This paper outlines the re-computation and compilation of the magnitudes now contained in the final ISC-GEM Reference Global Instrumental Earthquake Catalogue (1900-2009). The catalogue is available via the ISC website (http://www.isc.ac.uk/iscgem/). The available re-computed MS and mb provided an ideal basis for deriving new conversion relationships to moment magnitude MW. Therefore, rather than using previously published regression models, we derived new empirical relationships using both generalized orthogonal linear and exponential non-linear models to obtain MW proxies from MS and mb. The new models were tested against true values of MW, and the newly derived exponential models were then preferred to the linear ones in computing MW proxies. For the final magnitude composition of the ISC-GEM catalogue, we preferred directly measured MW values as published by the Global CMT project for the period 1976-2009 (plus intermediate-depth earthquakes between 1962 and 1975). In addition, over 1000 publications have been examined to obtain direct seismic moment M0 and, therefore, also MW estimates for 967 large earthquakes during 1900-1978 (Lee and Engdahl, 2015) by various alternative methods to the current GCMT procedure. In all other instances we computed MW proxy values by converting our re-computed MS and mb values into MW, using the newly derived non-linear regression models. The final magnitude composition is an improvement in terms of magnitude homogeneity compared to previous catalogues. The magnitude completeness is not homogeneous over the 110 years covered by the ISC-GEM catalogue. Therefore, seismicity rate estimates may be strongly affected without a careful time window selection. In particular, the ISC-GEM catalogue appears to be complete down to MW 5.6 starting from 1964, whereas for the early instrumental period the completeness varies from ∼7.5 to 6.2. Further time and resources would be necessary to homogenize the magnitude of completeness over the entire catalogue length.
Raj, Retheep; Sivanandan, K S
2017-01-01
Estimation of elbow dynamics has been the object of numerous investigations. In this work a solution is proposed for estimating elbow movement velocity and elbow joint angle from Surface Electromyography (SEMG) signals. Here the Surface Electromyography signals are acquired from the biceps brachii muscle of human hand. Two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), are extracted from the Surface Electromyography signal. The relationship between the time domain parameters, IEMG and ZC with elbow angular displacement and elbow angular velocity during extension and flexion of the elbow are studied. A multiple input-multiple output model is derived for identifying the kinematics of elbow. A Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural network (MLPNN) model is proposed for the estimation of elbow joint angle and elbow angular velocity. The proposed NARX MLPNN model is trained using Levenberg-marquardt based algorithm. The proposed model is estimating the elbow joint angle and elbow movement angular velocity with appreciable accuracy. The model is validated using regression coefficient value (R). The average regression coefficient value (R) obtained for elbow angular displacement prediction is 0.9641 and for the elbow anglular velocity prediction is 0.9347. The Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural networks (MLPNN) model can be used for the estimation of angular displacement and movement angular velocity of the elbow with good accuracy.
Effects of temperature on embryonic development of lake herring (Coregonus artedii)
Colby, Peter J.; Brooke, L.T.
1973-01-01
Embryonic development of lake herring (Coregonus artedii) was observed in the laboratory at 13 constant temperatures from 0.0 to 12.1 C and in Pickerel Lake (Washtenaw County, Michigan) at natural temperature regimes. Rate of development during incubation was based on progression of the embryos through 20 identifiable stages. An equation was derived to predict development stage at constant temperatures, on the general assumption that development stage (DS) is a function of time (days, D) and temperature (T). The equation should also be useful in interpreting estimates from future regressions that include other environmental variables that affect egg development. A second regression model, derived primarily for fluctuating temperatures, related development rate for stage j (DRj), expressed as the reciprocal of time, to temperature (x). The generalized equation for a development stage is: DRj = abx cx2 dx3. In general, time required for embryos to reach each stage of development in Pickerel Lake agreed closely with the time predicted from this equation, derived from our laboratory observations. Hatching time was predicted within 1 day in 1969 and within 2 days in 1970. We used the equations derived with the second model to predict the effect of the superimposition of temperature increases of 1 and 2 C on the measured temperatures in Pickerel Lake. Conceivably, hatching dates could be affected sufficiently to jeopardize the first feeding of lake herring through loss of harmony between hatching date and seasonal food availability.
NASA Astrophysics Data System (ADS)
Balidoy Baloloy, Alvin; Conferido Blanco, Ariel; Gumbao Candido, Christian; Labadisos Argamosa, Reginal Jay; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico
2018-04-01
Aboveground biomass estimation (AGB) is essential in determining the environmental and economic values of mangrove forests. Biomass prediction models can be developed through integration of remote sensing, field data and statistical models. This study aims to assess and compare the biomass predictor potential of multispectral bands, vegetation indices and biophysical variables that can be derived from three optical satellite systems: the Sentinel-2 with 10 m, 20 m and 60 m resolution; RapidEye with 5m resolution and PlanetScope with 3m ground resolution. Field data for biomass were collected from a Rhizophoraceae-dominated mangrove forest in Masinloc, Zambales, Philippines where 30 test plots (1.2 ha) and 5 validation plots (0.2 ha) were established. Prior to the generation of indices, images from the three satellite systems were pre-processed using atmospheric correction tools in SNAP (Sentinel-2), ENVI (RapidEye) and python (PlanetScope). The major predictor bands tested are Blue, Green and Red, which are present in the three systems; and Red-edge band from Sentinel-2 and Rapideye. The tested vegetation index predictors are Normalized Differenced Vegetation Index (NDVI), Soil-adjusted Vegetation Index (SAVI), Green-NDVI (GNDVI), Simple Ratio (SR), and Red-edge Simple Ratio (SRre). The study generated prediction models through conventional linear regression and multivariate regression. Higher coefficient of determination (r2) values were obtained using multispectral band predictors for Sentinel-2 (r2 = 0.89) and Planetscope (r2 = 0.80); and vegetation indices for RapidEye (r2 = 0.92). Multivariate Adaptive Regression Spline (MARS) models performed better than the linear regression models with r2 ranging from 0.62 to 0.92. Based on the r2 and root-mean-square errors (RMSE's), the best biomass prediction model per satellite were chosen and maps were generated. The accuracy of predicted biomass maps were high for both Sentinel-2 (r2 = 0.92) and RapidEye data (r2 = 0.91).
Assimilation of GOES-Derived Cloud Fields Into MM5
NASA Astrophysics Data System (ADS)
Biazar, A. P.; Doty, K. G.; McNider, R.
2007-12-01
This approach for the assimilation of GOES-derived cloud data into an atmospheric model (the Fifth-Generation Pennsylvania State University-National Center for Atmospheric Research Mesoscale Model, or MM5) was performed in two steps. In the first step, multiple linear regression equations were developed using a control MM5 simulation to develop relationships for several dependent variables in model columns that had one or more layers of clouds. In the second step, the regression equations were applied during an MM5 simulation with assimilation in which the hourly GOES satellite data were used to determine the cloud locations and some of the cloud properties, but with all the other variables being determined by the model data. The satellite-derived fields used were shortwave cloud albedo and cloud top pressure. Ten multiple linear regression equations were developed for the following dependent variables: total cloud depth, number of cloud layers, depth of the layer that contains the maximum vertical velocity, the maximum vertical velocity, the height of the maximum vertical velocity, the estimated 1-h stable (i.e., grid scale) precipitation rate, the estimated 1-h convective precipitation rate, the height of the level with the maximum positive diabatic heating, the magnitude of the maximum positive diabatic heating, and the largest continuous layer of upward motion. The horizontal components of the divergent wind were adjusted to be consistent with the regression estimate of the maximum vertical velocity. The new total horizontal wind field with these new divergent components was then used to nudge an ongoing MM5 model simulation towards the target vertical velocity. Other adjustments included diabatic heating and moistening at specified levels. Where the model simulation had clouds when the satellite data indicated clear conditions, procedures were taken to remove or diminish the errant clouds. The results for the period of 0000 UTC 28 June - 0000 UTC 16 July 1999 for both a continental 32-km grid and an 8-km grid over the Southeastern United States indicate a significant improvement in the cloud bias statistics. The main improvement was the reduction of high bias values that indicated times and locations in the control run when there were model clouds but when the satellite indicated clear conditions. The importance of this technique is that it has been able to assimilate the observed clouds in the model in a dynamically sustainable manner. Acknowledgments. This work was partially funded by the following grants: a GEWEX grant from NASA , the Cooperative Agreement between the University of Alabama in Huntsville and the Minerals Management Service on Gulf of Mexico Issues, a NASA applications grant, and a NSF grant.
We examined the utility of nutrient criteria derived solely from total phosphorus (TP) concentrations in streams (regression models and percentile distributions) and evaluated their ecological relevance to diatom and algal biomass responses. We used a variety of statistics to cha...
ERIC Educational Resources Information Center
Ferrer-Esteban, Gerard
2016-01-01
This article analyzes whether school social segregation, derived from policies and practices of both between-school student allocation and within-school streaming, is related to the effectiveness of the Italian education system. Hierarchical regression models are used to set out territorially aggregated factors of social sorting influencing…
School Cost Functions: A Meta-Regression Analysis
ERIC Educational Resources Information Center
Colegrave, Andrew D.; Giles, Margaret J.
2008-01-01
The education cost literature includes econometric studies attempting to determine economies of scale, or estimate an optimal school or district size. Not only do their results differ, but the studies use dissimilar data, techniques, and models. To derive value from these studies requires that the estimates be made comparable. One method to do…
Predictors of Child Molestation: Adult Attachment, Cognitive Distortions, and Empathy
ERIC Educational Resources Information Center
Wood, Eric; Riggs, Shelley
2008-01-01
A conceptual model derived from attachment theory was tested by examining adult attachment style, cognitive distortions, and both general and victim empathy in a sample of 61 paroled child molesters and 51 community controls. Results of logistic multiple regression showed that attachment anxiety, cognitive distortions, high general empathy but low…
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple…
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Sheng; Li, Hongyi; Huang, Maoyi
2014-07-21
Subsurface stormflow is an important component of the rainfall–runoff response, especially in steep terrain. Its contribution to total runoff is, however, poorly represented in the current generation of land surface models. The lack of physical basis of these common parameterizations precludes a priori estimation of the stormflow (i.e. without calibration), which is a major drawback for prediction in ungauged basins, or for use in global land surface models. This paper is aimed at deriving regionalized parameterizations of the storage–discharge relationship relating to subsurface stormflow from a top–down empirical data analysis of streamflow recession curves extracted from 50 eastern United Statesmore » catchments. Detailed regression analyses were performed between parameters of the empirical storage–discharge relationships and the controlling climate, soil and topographic characteristics. The regression analyses performed on empirical recession curves at catchment scale indicated that the coefficient of the power-law form storage–discharge relationship is closely related to the catchment hydrologic characteristics, which is consistent with the hydraulic theory derived mainly at the hillslope scale. As for the exponent, besides the role of field scale soil hydraulic properties as suggested by hydraulic theory, it is found to be more strongly affected by climate (aridity) at the catchment scale. At a fundamental level these results point to the need for more detailed exploration of the co-dependence of soil, vegetation and topography with climate.« less
Alam, Sarfaraz; Khan, Feroz
2014-01-01
Due to the high mortality rate in India, the identification of novel molecules is important in the development of novel and potent anticancer drugs. Xanthones are natural constituents of plants in the families Bonnetiaceae and Clusiaceae, and comprise oxygenated heterocycles with a variety of biological activities along with an anticancer effect. To explore the anticancer compounds from xanthone derivatives, a quantitative structure activity relationship (QSAR) model was developed by the multiple linear regression method. The structure–activity relationship represented by the QSAR model yielded a high activity–descriptors relationship accuracy (84%) referred by regression coefficient (r2=0.84) and a high activity prediction accuracy (82%). Five molecular descriptors – dielectric energy, group count (hydroxyl), LogP (the logarithm of the partition coefficient between n-octanol and water), shape index basic (order 3), and the solvent-accessible surface area – were significantly correlated with anticancer activity. Using this QSAR model, a set of virtually designed xanthone derivatives was screened out. A molecular docking study was also carried out to predict the molecular interaction between proposed compounds and deoxyribonucleic acid (DNA) topoisomerase IIα. The pharmacokinetics parameters, such as absorption, distribution, metabolism, excretion, and toxicity, were also calculated, and later an appraisal of synthetic accessibility of organic compounds was carried out. The strategy used in this study may provide understanding in designing novel DNA topoisomerase IIα inhibitors, as well as for other cancer targets. PMID:24516330
Digital histology quantification of intra-hepatic fat in patients undergoing liver resection.
Parkin, E; O'Reilly, D A; Plumb, A A; Manoharan, P; Rao, M; Coe, P; Frystyk, J; Ammori, B; de Liguori Carino, N; Deshpande, R; Sherlock, D J; Renehan, A G
2015-08-01
High intra-hepatic fat (IHF) content is associated with insulin resistance, visceral adiposity, and increased morbidity and mortality following liver resection. However, in clinical practice, IHF is assessed indirectly by pre-operative imaging [for example, chemical-shift magnetic resonance (CS-MR)]. We used the opportunity in patients undergoing liver resection to quantify IHF by digital histology (D-IHF) and relate this to CT-derived anthropometrics, insulin-related serum biomarkers, and IHF estimated by CS-MR. A reproducible method for quantification of D-IHF using 7 histology slides (inter- and intra-rater concordance: 0.97 and 0.98) was developed. In 35 patients undergoing resection for colorectal cancer metastases, we measured: CT-derived subcutaneous and visceral adipose tissue volumes, Homeostasis Model Assessment of Insulin Resistance (HOMA-IR), fasting serum adiponectin, leptin and fetuin-A. We estimated relative IHF using CS-MR and developed prediction models for IHF using a factor-clustered approach. The multivariate linear regression models showed that D-IHF was best predicted by HOMA-IR (Beta coefficient(per doubling): 2.410, 95% CI: 1.093, 5.313) and adiponectin (β(per doubling): 0.197, 95% CI: 0.058, 0.667), but not by anthropometrics. MR-derived IHF correlated with D-IHF (rho: 0.626; p = 0.0001), but levels of agreement deviated in upper range values (CS-MR over-estimated IHF: regression versus zero, p = 0.009); this could be adjusted for by a correction factor (CF: 0.7816). Our findings show IHF is associated with measures of insulin resistance, but not measures of visceral adiposity. CS-MR over-estimated IHF in the upper range. Larger studies are indicated to test whether a correction of imaging-derived IHF estimates is valid. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
Confidence limits for data mining models of options prices
NASA Astrophysics Data System (ADS)
Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.
2004-12-01
Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).
Outcome modelling strategies in epidemiology: traditional methods and basic alternatives
Greenland, Sander; Daniel, Rhian; Pearce, Neil
2016-01-01
Abstract Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled covariates are often deployed. We review several traditional modelling strategies, including stepwise regression and the ‘change-in-estimate’ (CIE) approach to deciding which potential confounders to include in an outcome-regression model for estimating effects of a targeted exposure. We discuss their shortcomings, and then provide some basic alternatives and refinements that do not require special macros or programming. Throughout, we assume the main goal is to derive the most accurate effect estimates obtainable from the data and commercial software. Allowing that most users must stay within standard software packages, this goal can be roughly approximated using basic methods to assess, and thereby minimize, mean squared error (MSE). PMID:27097747
Analysis of flight data from a High-Incidence Research Model by system identification methods
NASA Technical Reports Server (NTRS)
Batterson, James G.; Klein, Vladislav
1989-01-01
Data partitioning and modified stepwise regression were applied to recorded flight data from a Royal Aerospace Establishment high incidence research model. An aerodynamic model structure and corresponding stability and control derivatives were determined for angles of attack between 18 and 30 deg. Several nonlinearities in angles of attack and sideslip as well as a unique roll-dominated set of lateral modes were found. All flight estimated values were compared to available wind tunnel measurements.
NASA Astrophysics Data System (ADS)
Aygunes, Gunes
2017-07-01
The objective of this paper is to survey and determine the macroeconomic factors affecting the level of venture capital (VC) investments in a country. The literary depends on venture capitalists' quality and countries' venture capital investments. The aim of this paper is to give relationship between venture capital investment and macro economic variables via statistical computation method. We investigate the countries and macro economic variables. By using statistical computation method, we derive correlation between venture capital investments and macro economic variables. According to method of logistic regression model (logit regression or logit model), macro economic variables are correlated with each other in three group. Venture capitalists regard correlations as a indicator. Finally, we give correlation matrix of our results.
Eike, Liv-Marie; Mauseth, Brynjar; Camilio, Ketil André; Rekdal, Øystein; Sveinbjørnsson, Baldur
2016-01-01
In the present study we examined the ability of the amino acid derivative LTX-401 to induce cell death in cancer cell lines, as well as the capacity to induce regression in a murine melanoma model. Mode of action studies in vitro revealed lytic cell death and release of danger-associated molecular pattern molecules, preceded by massive cytoplasmic vacuolization and compromised lysosomes in treated cells. The use of a murine melanoma model demonstrated that the majority of animals treated with intratumoural injections of LTX-401 showed complete and long-lasting remission. Taken together, these results demonstrate the potential of LTX-401 as an immunotherapeutic agent for the treatment of solid tumors.
Shahlaei, M.; Saghaie, L.
2014-01-01
A quantitative structure–activity relationship (QSAR) study is suggested for the prediction of biological activity (pIC50) of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors. Modeling of the biological activities of compounds of interest as a function of molecular structures was established by means of principal component analysis (PCA) and least square support vector machine (LS-SVM) methods. The results showed that the pIC50 values calculated by LS-SVM are in good agreement with the experimental data, and the performance of the LS-SVM regression model is superior to the PCA-based model. The developed LS-SVM model was applied for the prediction of the biological activities of pyrimidone derivatives, which were not in the modeling procedure. The resulted model showed high prediction ability with root mean square error of prediction of 0.460 for LS-SVM. The study provided a novel and effective approach for predicting biological activities of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors and disclosed that LS-SVM can be used as a powerful chemometrics tool for QSAR studies. PMID:26339262
Mandel, Micha; Gauthier, Susan A; Guttmann, Charles R G; Weiner, Howard L; Betensky, Rebecca A
2007-12-01
The expanded disability status scale (EDSS) is an ordinal score that measures progression in multiple sclerosis (MS). Progression is defined as reaching EDSS of a certain level (absolute progression) or increasing of one point of EDSS (relative progression). Survival methods for time to progression are not adequate for such data since they do not exploit the EDSS level at the end of follow-up. Instead, we suggest a Markov transitional model applicable for repeated categorical or ordinal data. This approach enables derivation of covariate-specific survival curves, obtained after estimation of the regression coefficients and manipulations of the resulting transition matrix. Large sample theory and resampling methods are employed to derive pointwise confidence intervals, which perform well in simulation. Methods for generating survival curves for time to EDSS of a certain level, time to increase of EDSS of at least one point, and time to two consecutive visits with EDSS greater than three are described explicitly. The regression models described are easily implemented using standard software packages. Survival curves are obtained from the regression results using packages that support simple matrix calculation. We present and demonstrate our method on data collected at the Partners MS center in Boston, MA. We apply our approach to progression defined by time to two consecutive visits with EDSS greater than three, and calculate crude (without covariates) and covariate-specific curves.
Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra
NASA Astrophysics Data System (ADS)
Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong
2017-08-01
Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.
SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.
Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman
2017-03-01
We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).
Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal
2012-08-01
A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not. Copyright © 2012 Elsevier Ltd. All rights reserved.
Van Boeckel, Thomas P; Thanapongtharm, Weerapong; Robinson, Timothy; Biradar, Chandrashekhar M; Xiao, Xiangming; Gilbert, Marius
2012-01-01
Since 1996 when Highly Pathogenic Avian Influenza type H5N1 first emerged in southern China, numerous studies sought risk factors and produced risk maps based on environmental and anthropogenic predictors. However little attention has been paid to the link between the level of intensification of poultry production and the risk of outbreak. This study revised H5N1 risk mapping in Central and Western Thailand during the second wave of the 2004 epidemic. Production structure was quantified using a disaggregation methodology based on the number of poultry per holding. Population densities of extensively- and intensively-raised ducks and chickens were derived both at the sub-district and at the village levels. LandSat images were used to derive another previously neglected potential predictor of HPAI H5N1 risk: the proportion of water in the landscape resulting from floods. We used Monte Carlo simulation of Boosted Regression Trees models of predictor variables to characterize the risk of HPAI H5N1. Maps of mean risk and uncertainty were derived both at the sub-district and the village levels. The overall accuracy of Boosted Regression Trees models was comparable to that of logistic regression approaches. The proportion of area flooded made the highest contribution to predicting the risk of outbreak, followed by the densities of intensively-raised ducks, extensively-raised ducks and human population. Our results showed that as little as 15% of flooded land in villages is sufficient to reach the maximum level of risk associated with this variable. The spatial pattern of predicted risk is similar to previous work: areas at risk are mainly located along the flood plain of the Chao Phraya river and to the south-east of Bangkok. Using high-resolution village-level poultry census data, rather than sub-district data, the spatial accuracy of predictions was enhanced to highlight local variations in risk. Such maps provide useful information to guide intervention.
Van Boeckel, Thomas P.; Thanapongtharm, Weerapong; Robinson, Timothy; Biradar, Chandrashekhar M.; Xiao, Xiangming; Gilbert, Marius
2012-01-01
Since 1996 when Highly Pathogenic Avian Influenza type H5N1 first emerged in southern China, numerous studies sought risk factors and produced risk maps based on environmental and anthropogenic predictors. However little attention has been paid to the link between the level of intensification of poultry production and the risk of outbreak. This study revised H5N1 risk mapping in Central and Western Thailand during the second wave of the 2004 epidemic. Production structure was quantified using a disaggregation methodology based on the number of poultry per holding. Population densities of extensively- and intensively-raised ducks and chickens were derived both at the sub-district and at the village levels. LandSat images were used to derive another previously neglected potential predictor of HPAI H5N1 risk: the proportion of water in the landscape resulting from floods. We used Monte Carlo simulation of Boosted Regression Trees models of predictor variables to characterize the risk of HPAI H5N1. Maps of mean risk and uncertainty were derived both at the sub-district and the village levels. The overall accuracy of Boosted Regression Trees models was comparable to that of logistic regression approaches. The proportion of area flooded made the highest contribution to predicting the risk of outbreak, followed by the densities of intensively-raised ducks, extensively-raised ducks and human population. Our results showed that as little as 15% of flooded land in villages is sufficient to reach the maximum level of risk associated with this variable. The spatial pattern of predicted risk is similar to previous work: areas at risk are mainly located along the flood plain of the Chao Phraya river and to the south-east of Bangkok. Using high-resolution village-level poultry census data, rather than sub-district data, the spatial accuracy of predictions was enhanced to highlight local variations in risk. Such maps provide useful information to guide intervention. PMID:23185352
Statistical Approaches for Spatiotemporal Prediction of Low Flows
NASA Astrophysics Data System (ADS)
Fangmann, A.; Haberlandt, U.
2017-12-01
An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.
NASA Technical Reports Server (NTRS)
Leduc, S. (Principal Investigator)
1982-01-01
Models based on multiple regression were developed to estimate corn and soybean yield from weather data for agrophysical units (APU) in Iowa. The predictor variables are derived from monthly average temperature and monthly total precipitation data at meteorological stations in the cooperative network. The models are similar in form to the previous models developed for crop reporting districts (CRD). The trends and derived variables were the same and the approach to select the significant predictors was similar to that used in developing the CRD models. The APU's were selected to be more homogeneous with respect crop to production than the CRDs. The APU models are quite similar to the CRD models, similar explained variation and number of predictor variables. The APU models are to be independently evaluated and compared to the previously evaluated CRD models. That comparison should indicate the preferred model area for this application, i.e., APU or CRD.
On the degrees of freedom of reduced-rank estimators in multivariate regression
Mukherjee, A.; Chen, K.; Wang, N.; Zhu, J.
2015-01-01
Summary We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example. PMID:26702155
Two-time correlation function of an open quantum system in contact with a Gaussian reservoir
NASA Astrophysics Data System (ADS)
Ban, Masashi; Kitajima, Sachiko; Shibata, Fumiaki
2018-05-01
An exact formula of a two-time correlation function is derived for an open quantum system which interacts with a Gaussian thermal reservoir. It is provided in terms of functional derivative with respect to fictitious fields. A perturbative expansion and its diagrammatic representation are developed, where the small expansion parameter is related to a correlation time of the Gaussian thermal reservoir. The two-time correlation function of the lowest order is equivalent to that calculated by means of the quantum regression theorem. The result clearly shows that the violation of the quantum regression theorem is caused by a finiteness of the reservoir correlation time. By making use of an exactly solvable model consisting of a two-level system and a set of harmonic oscillators, it is shown that the two-time correlation function up to the first order is a good approximation to the exact one.
Vizcaino, Pilar; Lavalle, Carlo
2018-05-04
A new Land Use Regression model was built to develop pan-European 100 m resolution maps of NO 2 concentrations. The model was built using NO 2 concentrations from routine monitoring stations available in the Airbase database as dependent variable. Predictor variables included land use, road traffic proxies, population density, climatic and topographical variables, and distance to sea. In order to capture international and inter regional disparities not accounted for with the mentioned predictor variables, additional proxies of NO 2 concentrations, like levels of activity intensity and NO x emissions for specific sectors, were also included. The model was built using Random Forest techniques. Model performance was relatively good given the EU-wide scale (R 2 = 0.53). Output predictions of annual average concentrations of NO 2 were in line with other existing models in terms of spatial distribution and values of concentration. The model was validated for year 2015, comparing model predictions derived from updated values of independent variables, with concentrations in monitoring stations for that year. The algorithm was then used to model future concentrations up to the year 2030, considering different emission scenarios as well as changes in land use, population distribution and economic factors assuming the most likely socio-economic trends. Levels of exposure were derived from maps of concentration. The model proved to be a useful tool for the ex-ante evaluation of specific air pollution mitigation measures, and more broadly, for impact assessment of EU policies on territorial development. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Sengupta, Neil; Tapper, Elliot B
2017-05-01
There are limited data to predict which patients with lower gastrointestinal bleeding are at risk for adverse outcomes. We aimed to develop a clinical tool based on admission variables to predict 30-day mortality in lower gastrointestinal bleeding. We used a validated machine learning algorithm to identify adult patients hospitalized with lower gastrointestinal bleeding at an academic medical center between 2008 and 2015. The cohort was split randomly into derivation and validation cohorts. In the derivation cohort, we used multiple logistic regression on all candidate admission variables to create a prediction model for 30-day mortality, using area under the receiving operator characteristic curve and misclassification rate to estimate prediction accuracy. Regression coefficients were used to derive an integer score, and mortality risk associated with point totals was assessed. In the derivation cohort (n = 4044), 8 variables were most associated with 30-day mortality: age, dementia, metastatic cancer, chronic kidney disease, chronic pulmonary disease, anticoagulant use, admission hematocrit, and albumin. The model yielded a misclassification rate of 0.06 and area under the curve of 0.81. The integer score ranged from -10 to 26 in the derivation cohort, with a misclassification rate of 0.11 and area under the curve of 0.74. In the validation cohort (n = 2060), the score had an area under the curve of 0.72 with a misclassification rate of 0.12. After dividing the score into 4 quartiles of risk, 30-day mortality in the derivation and validation sets was 3.6% and 4.4% in quartile 1, 4.9% and 7.3% in quartile 2, 9.9% and 9.1% in quartile 3, and 24% and 26% in quartile 4, respectively. A clinical tool can be used to predict 30-day mortality in patients hospitalized with lower gastrointestinal bleeding. Copyright © 2017 Elsevier Inc. All rights reserved.
Characterizing multivariate decoding models based on correlated EEG spectral features
McFarland, Dennis J.
2013-01-01
Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267
NASA Astrophysics Data System (ADS)
Lopez, Patricia; Verkade, Jan; Weerts, Albrecht; Solomatine, Dimitri
2014-05-01
Hydrological forecasting is subject to many sources of uncertainty, including those originating in initial state, boundary conditions, model structure and model parameters. Although uncertainty can be reduced, it can never be fully eliminated. Statistical post-processing techniques constitute an often used approach to estimate the hydrological predictive uncertainty, where a model of forecast error is built using a historical record of past forecasts and observations. The present study focuses on the use of the Quantile Regression (QR) technique as a hydrological post-processor. It estimates the predictive distribution of water levels using deterministic water level forecasts as predictors. This work aims to thoroughly verify uncertainty estimates using the implementation of QR that was applied in an operational setting in the UK National Flood Forecasting System, and to inter-compare forecast quality and skill in various, differing configurations of QR. These configurations are (i) 'classical' QR, (ii) QR constrained by a requirement that quantiles do not cross, (iii) QR derived on time series that have been transformed into the Normal domain (Normal Quantile Transformation - NQT), and (iv) a piecewise linear derivation of QR models. The QR configurations are applied to fourteen hydrological stations on the Upper Severn River with different catchments characteristics. Results of each QR configuration are conditionally verified for progressively higher flood levels, in terms of commonly used verification metrics and skill scores. These include Brier's probability score (BS), the continuous ranked probability score (CRPS) and corresponding skill scores as well as the Relative Operating Characteristic score (ROCS). Reliability diagrams are also presented and analysed. The results indicate that none of the four Quantile Regression configurations clearly outperforms the others.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
An administrative claims model for profiling hospital 30-day mortality rates for pneumonia patients.
Bratzler, Dale W; Normand, Sharon-Lise T; Wang, Yun; O'Donnell, Walter J; Metersky, Mark; Han, Lein F; Rapp, Michael T; Krumholz, Harlan M
2011-04-12
Outcome measures for patients hospitalized with pneumonia may complement process measures in characterizing quality of care. We sought to develop and validate a hierarchical regression model using Medicare claims data that produces hospital-level, risk-standardized 30-day mortality rates useful for public reporting for patients hospitalized with pneumonia. Retrospective study of fee-for-service Medicare beneficiaries age 66 years and older with a principal discharge diagnosis of pneumonia. Candidate risk-adjustment variables included patient demographics, administrative diagnosis codes from the index hospitalization, and all inpatient and outpatient encounters from the year before admission. The model derivation cohort included 224,608 pneumonia cases admitted to 4,664 hospitals in 2000, and validation cohorts included cases from each of years 1998-2003. We compared model-derived state-level standardized mortality estimates with medical record-derived state-level standardized mortality estimates using data from the Medicare National Pneumonia Project on 50,858 patients hospitalized from 1998-2001. The final model included 31 variables and had an area under the Receiver Operating Characteristic curve of 0.72. In each administrative claims validation cohort, model fit was similar to the derivation cohort. The distribution of standardized mortality rates among hospitals ranged from 13.0% to 23.7%, with 25(th), 50(th), and 75(th) percentiles of 16.5%, 17.4%, and 18.3%, respectively. Comparing model-derived risk-standardized state mortality rates with medical record-derived estimates, the correlation coefficient was 0.86 (Standard Error = 0.032). An administrative claims-based model for profiling hospitals for pneumonia mortality performs consistently over several years and produces hospital estimates close to those using a medical record model.
An Administrative Claims Model for Profiling Hospital 30-Day Mortality Rates for Pneumonia Patients
Bratzler, Dale W.; Normand, Sharon-Lise T.; Wang, Yun; O'Donnell, Walter J.; Metersky, Mark; Han, Lein F.; Rapp, Michael T.; Krumholz, Harlan M.
2011-01-01
Background Outcome measures for patients hospitalized with pneumonia may complement process measures in characterizing quality of care. We sought to develop and validate a hierarchical regression model using Medicare claims data that produces hospital-level, risk-standardized 30-day mortality rates useful for public reporting for patients hospitalized with pneumonia. Methodology/Principal Findings Retrospective study of fee-for-service Medicare beneficiaries age 66 years and older with a principal discharge diagnosis of pneumonia. Candidate risk-adjustment variables included patient demographics, administrative diagnosis codes from the index hospitalization, and all inpatient and outpatient encounters from the year before admission. The model derivation cohort included 224,608 pneumonia cases admitted to 4,664 hospitals in 2000, and validation cohorts included cases from each of years 1998–2003. We compared model-derived state-level standardized mortality estimates with medical record-derived state-level standardized mortality estimates using data from the Medicare National Pneumonia Project on 50,858 patients hospitalized from 1998–2001. The final model included 31 variables and had an area under the Receiver Operating Characteristic curve of 0.72. In each administrative claims validation cohort, model fit was similar to the derivation cohort. The distribution of standardized mortality rates among hospitals ranged from 13.0% to 23.7%, with 25th, 50th, and 75th percentiles of 16.5%, 17.4%, and 18.3%, respectively. Comparing model-derived risk-standardized state mortality rates with medical record-derived estimates, the correlation coefficient was 0.86 (Standard Error = 0.032). Conclusions/Significance An administrative claims-based model for profiling hospitals for pneumonia mortality performs consistently over several years and produces hospital estimates close to those using a medical record model. PMID:21532758
Baldi, F; Albuquerque, L G; Alencar, M M
2010-08-01
The objective of this work was to estimate covariance functions for direct and maternal genetic effects, animal and maternal permanent environmental effects, and subsequently, to derive relevant genetic parameters for growth traits in Canchim cattle. Data comprised 49,011 weight records on 2435 females from birth to adult age. The model of analysis included fixed effects of contemporary groups (year and month of birth and at weighing) and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were allowed to vary and were modelled by a step function with 1, 4 or 11 classes based on animal's age. The model fitting four classes of residual variances was the best. A total of 12 random regression models from second to seventh order were used to model direct and maternal genetic effects, animal and maternal permanent environmental effects. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects fitted by quadric, cubic, quintic and linear Legendre polynomials, respectively, was the most adequate to describe the covariance structure of the data. Estimates of direct and maternal heritability obtained by multi-trait (seven traits) and random regression models were very similar. Selection for higher weight at any age, especially after weaning, will produce an increase in mature cow weight. The possibility to modify the growth curve in Canchim cattle to obtain animals with rapid growth at early ages and moderate to low mature cow weight is limited.
Effects of wing modification on an aircraft's aerodynamic parameters as determined from flight data
NASA Technical Reports Server (NTRS)
Hess, R. A.
1986-01-01
A study of the effects of four wing-leading-edge modifications on a general aviation aircraft's stability and control parameters is presented. Flight data from the basic aircraft configuration and configurations with wing modifications are analyzed to determine each wing geometry's stability and control parameters. The parameter estimates and aerodynamic model forms are obtained using the stepwise regression and maximum likelihood techniques. The resulting parameter estimates and aerodynamic models are verified using vortex-lattice theory and by analysis of each model's ability to predict aircraft behavior. Comparisons of the stability and control derivative estimates from the basic wing and the four leading-edge modifications are accomplished so that the effects of each modification on aircraft stability and control derivatives can be determined.
Dudley, Robert W.; Hodgkins, Glenn A.; Dickinson, Jesse
2017-01-01
We present a logistic regression approach for forecasting the probability of future groundwater levels declining or maintaining below specific groundwater-level thresholds. We tested our approach on 102 groundwater wells in different climatic regions and aquifers of the United States that are part of the U.S. Geological Survey Groundwater Climate Response Network. We evaluated the importance of current groundwater levels, precipitation, streamflow, seasonal variability, Palmer Drought Severity Index, and atmosphere/ocean indices for developing the logistic regression equations. Several diagnostics of model fit were used to evaluate the regression equations, including testing of autocorrelation of residuals, goodness-of-fit metrics, and bootstrap validation testing. The probabilistic predictions were most successful at wells with high persistence (low month-to-month variability) in their groundwater records and at wells where the groundwater level remained below the defined low threshold for sustained periods (generally three months or longer). The model fit was weakest at wells with strong seasonal variability in levels and with shorter duration low-threshold events. We identified challenges in deriving probabilistic-forecasting models and possible approaches for addressing those challenges.
Vegetation Monitoring with Gaussian Processes and Latent Force Models
NASA Astrophysics Data System (ADS)
Camps-Valls, Gustau; Svendsen, Daniel; Martino, Luca; Campos, Manuel; Luengo, David
2017-04-01
Monitoring vegetation by biophysical parameter retrieval from Earth observation data is a challenging problem, where machine learning is currently a key player. Neural networks, kernel methods, and Gaussian Process (GP) regression have excelled in parameter retrieval tasks at both local and global scales. GP regression is based on solid Bayesian statistics, yield efficient and accurate parameter estimates, and provides interesting advantages over competing machine learning approaches such as confidence intervals. However, GP models are hampered by lack of interpretability, that prevented the widespread adoption by a larger community. In this presentation we will summarize some of our latest developments to address this issue. We will review the main characteristics of GPs and their advantages in vegetation monitoring standard applications. Then, three advanced GP models will be introduced. First, we will derive sensitivity maps for the GP predictive function that allows us to obtain feature ranking from the model and to assess the influence of examples in the solution. Second, we will introduce a Joint GP (JGP) model that combines in situ measurements and simulated radiative transfer data in a single GP model. The JGP regression provides more sensible confidence intervals for the predictions, respects the physics of the underlying processes, and allows for transferability across time and space. Finally, a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical models of the system is presented. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Empirical evidence of the performance of these models will be presented through illustrative examples.
Vesicular stomatitis forecasting based on Google Trends
Lu, Yi; Zhou, GuangYa; Chen, Qin
2018-01-01
Background Vesicular stomatitis (VS) is an important viral disease of livestock. The main feature of VS is irregular blisters that occur on the lips, tongue, oral mucosa, hoof crown and nipple. Humans can also be infected with vesicular stomatitis and develop meningitis. This study analyses 2014 American VS outbreaks in order to accurately predict vesicular stomatitis outbreak trends. Methods American VS outbreaks data were collected from OIE. The data for VS keywords were obtained by inputting 24 disease-related keywords into Google Trends. After calculating the Pearson and Spearman correlation coefficients, it was found that there was a relationship between outbreaks and keywords derived from Google Trends. Finally, the predicted model was constructed based on qualitative classification and quantitative regression. Results For the regression model, the Pearson correlation coefficients between the predicted outbreaks and actual outbreaks are 0.953 and 0.948, respectively. For the qualitative classification model, we constructed five classification predictive models and chose the best classification predictive model as the result. The results showed, SN (sensitivity), SP (specificity) and ACC (prediction accuracy) values of the best classification predictive model are 78.52%,72.5% and 77.14%, respectively. Conclusion This study applied Google search data to construct a qualitative classification model and a quantitative regression model. The results show that the method is effective and that these two models obtain more accurate forecast. PMID:29385198
Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N
2017-01-01
DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Allometric scaling of biceps strength before and after resistance training in men.
Zoeller, Robert F; Ryan, Eric D; Gordish-Dressman, Heather; Price, Thomas B; Seip, Richard L; Angelopoulos, Theodore J; Moyna, Niall M; Gordon, Paul M; Thompson, Paul D; Hoffman, Eric P
2007-06-01
The purposes of this study were 1) derive allometric scaling models of isometric biceps muscle strength using pretraining body mass (BM) and muscle cross-sectional area (CSA) as scaling variables in adult males, 2) test model appropriateness using regression diagnostics, and 3) cross-validate the models before and after 12 wk of resistance training. A subset of FAMuSS (Functional SNP Associated with Muscle Size and Strength) study data (N=136) were randomly split into two groups (A and B). Allometric scaling models using pretraining BM and CSA were derived and tested for group A. The scaling exponents determined from these models were then applied to and tested on group B pretraining data. Finally, these scaling exponents were applied to and tested on group A and B posttraining data. BM and CSA models produced scaling exponents of 0.64 and 0.71, respectively. Regression diagnostics determined both models to be appropriate. Cross-validation of the models to group B showed that the BM model, but not the CSA model, was appropriate. Removal of the largest six subjects (CSA>30 cm) from group B resulted in an appropriate fit for the CSA model. Application of the models to group A posttraining data showed that both models were appropriate, but only the body mass model was successful for group B. These data suggest that the application of scaling exponents of 0.64 and 0.71, using BM and CSA, respectively, are appropriate for scaling isometric biceps strength in adult males. However, the scaling exponent using CSA may not be appropriate for individuals with biceps CSA>30 cm. Finally, 12 wk of resistance training does not alter the relationship between BM, CSA, and muscular strength as assessed by allometric scaling.
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Ejlerskov, Katrine T.; Jensen, Signe M.; Christensen, Line B.; Ritz, Christian; Michaelsen, Kim F.; Mølgaard, Christian
2014-01-01
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height2/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2–4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity. PMID:24463487
Ejlerskov, Katrine T; Jensen, Signe M; Christensen, Line B; Ritz, Christian; Michaelsen, Kim F; Mølgaard, Christian
2014-01-27
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height(2)/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2-4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity.
Curriculum-Based Measurement of Oral Reading: Quality of Progress Monitoring Outcomes
ERIC Educational Resources Information Center
Christ, Theodore J.; Zopluoglu, Cengiz; Long, Jeffery D.; Monaghen, Barbara D.
2012-01-01
Curriculum-based measurement of oral reading (CBM-R) is frequently used to set student goals and monitor student progress. This study examined the quality of growth estimates derived from CBM-R progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for multiple levels of…
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.
Comparison of stream invertebrate response models for bioassessment metric
Waite, Ian R.; Kennen, Jonathan G.; May, Jason T.; Brown, Larry R.; Cuffney, Thomas F.; Jones, Kimberly A.; Orlando, James L.
2012-01-01
We aggregated invertebrate data from various sources to assemble data for modeling in two ecoregions in Oregon and one in California. Our goal was to compare the performance of models developed using multiple linear regression (MLR) techniques with models developed using three relatively new techniques: classification and regression trees (CART), random forest (RF), and boosted regression trees (BRT). We used tolerance of taxa based on richness (RICHTOL) and ratio of observed to expected taxa (O/E) as response variables and land use/land cover as explanatory variables. Responses were generally linear; therefore, there was little improvement to the MLR models when compared to models using CART and RF. In general, the four modeling techniques (MLR, CART, RF, and BRT) consistently selected the same primary explanatory variables for each region. However, results from the BRT models showed significant improvement over the MLR models for each region; increases in R2 from 0.09 to 0.20. The O/E metric that was derived from models specifically calibrated for Oregon consistently had lower R2 values than RICHTOL for the two regions tested. Modeled O/E R2 values were between 0.06 and 0.10 lower for each of the four modeling methods applied in the Willamette Valley and were between 0.19 and 0.36 points lower for the Blue Mountains. As a result, BRT models may indeed represent a good alternative to MLR for modeling species distribution relative to environmental variables.
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
Seasonal forecasting of high wind speeds over Western Europe
NASA Astrophysics Data System (ADS)
Palutikof, J. P.; Holt, T.
2003-04-01
As financial losses associated with extreme weather events escalate, there is interest from end users in the forestry and insurance industries, for example, in the development of seasonal forecasting models with a long lead time. This study uses exceedences of the 90th, 95th, and 99th percentiles of daily maximum wind speed over the period 1958 to present to derive predictands of winter wind extremes. The source data is the 6-hourly NCEP Reanalysis gridded surface wind field. Predictor variables include principal components of Atlantic sea surface temperature and several indices of climate variability, including the NAO and SOI. Lead times of up to a year are considered, in monthly increments. Three regression techniques are evaluated; multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS). PCR and PLS proved considerably superior to MLR with much lower standard errors. PLS was chosen to formulate the predictive model since it offers more flexibility in experimental design and gave slightly better results than PCR. The results indicate that winter windiness can be predicted with considerable skill one year ahead for much of coastal Europe, but that this deteriorates rapidly in the hinterland. The experiment succeeded in highlighting PLS as a very useful method for developing more precise forecasting models, and in identifying areas of high predictability.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Mental Models of Software Forecasting
NASA Technical Reports Server (NTRS)
Hihn, J.; Griesel, A.; Bruno, K.; Fouser, T.; Tausworthe, R.
1993-01-01
The majority of software engineers resist the use of the currently available cost models. One problem is that the mathematical and statistical models that are currently available do not correspond with the mental models of the software engineers. In an earlier JPL funded study (Hihn and Habib-agahi, 1991) it was found that software engineers prefer to use analogical or analogy-like techniques to derive size and cost estimates, whereas curren CER's hide any analogy in the regression equations. In addition, the currently available models depend upon information which is not available during early planning when the most important forecasts must be made.
Bayesian Regression of Thermodynamic Models of Redox Active Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnston, Katherine
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from themore » model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).« less
Combustion performance and scale effect from N2O/HTPB hybrid rocket motor simulations
NASA Astrophysics Data System (ADS)
Shan, Fanli; Hou, Lingyun; Piao, Ying
2013-04-01
HRM code for the simulation of N2O/HTPB hybrid rocket motor operation and scale effect analysis has been developed. This code can be used to calculate motor thrust and distributions of physical properties inside the combustion chamber and nozzle during the operational phase by solving the unsteady Navier-Stokes equations using a corrected compressible difference scheme and a two-step, five species combustion model. A dynamic fuel surface regression technique and a two-step calculation method together with the gas-solid coupling are applied in the calculation of fuel regression and the determination of combustion chamber wall profile as fuel regresses. Both the calculated motor thrust from start-up to shut-down mode and the combustion chamber wall profile after motor operation are in good agreements with experimental data. The fuel regression rate equation and the relation between fuel regression rate and axial distance have been derived. Analysis of results suggests improvements in combustion performance to the current hybrid rocket motor design and explains scale effects in the variation of fuel regression rate with combustion chamber diameter.
Geddes, C.A.; Brown, D.G.; Fagre, D.B.
2005-01-01
We derived and implemented two spatial models of May snow water equivalent (SWE) at Lee Ridge in Glacier National Park, Montana. We used the models to test the hypothesis that vegetation structure is a control on snow redistribution at the alpine treeline ecotone (ATE). The statistical models were derived using stepwise and "best" subsets regression techniques. The first model was derived from field measurements of SWE, topography, and vegetation taken at 27 sample points. The second model was derived using GIS-based measures of topography and vegetation. Both the field- (R² = 0.93) and GIS-based models (R² = 0.69) of May SWE included the following variables: site type (based on vegetation), elevation, maximum slope, and general slope aspect. Site type was identified as the most important predictor of SWE in both models, accounting for 74.0% and 29.5% of the variation, respectively. The GIS-based model was applied to create a predictive map of SWE across Lee Ridge, predicting little snow accumulation on the top of the ridge where vegetation is scarce. The GIS model failed in large depressions, including ephemeral stream channels. The models supported the hypothesis that upright vegetation has a positive effect on accumulation of SWE above and beyond the effects of topography. Vegetation, therefore, creates a positive feedback in which it modifies its, environment and could affect the ability of additional vegetation to become established.
2016-03-01
regression models that yield hedonic price indexes is closely related to standard techniques for developing cost estimating relationships ( CERs ...October 2014). iii analysis) and derives a price index from the coefficients on variables reflecting the year of purchase. In CER development, the...index. The relevant cost metric in both cases is unit recurring flyaway (URF) costs. For the current project, we develop a “Baseline” CER model, taking
Novel applications of the temporal kernel method: Historical and future radiative forcing
NASA Astrophysics Data System (ADS)
Portmann, R. W.; Larson, E.; Solomon, S.; Murphy, D. M.
2017-12-01
We present a new estimate of the historical radiative forcing derived from the observed global mean surface temperature and a model derived kernel function. Current estimates of historical radiative forcing are usually derived from climate models. Despite large variability in these models, the multi-model mean tends to do a reasonable job of representing the Earth system and climate. One method of diagnosing the transient radiative forcing in these models requires model output of top of the atmosphere radiative imbalance and global mean temperature anomaly. It is difficult to apply this method to historical observations due to the lack of TOA radiative measurements before CERES. We apply the temporal kernel method (TKM) of calculating radiative forcing to the historical global mean temperature anomaly. This novel approach is compared against the current regression based methods using model outputs and shown to produce consistent forcing estimates giving confidence in the forcing derived from the historical temperature record. The derived TKM radiative forcing provides an estimate of the forcing time series that the average climate model needs to produce the observed temperature record. This forcing time series is found to be in good overall agreement with previous estimates but includes significant differences that will be discussed. The historical anthropogenic aerosol forcing is estimated as a residual from the TKM and found to be consistent with earlier moderate forcing estimates. In addition, this method is applied to future temperature projections to estimate the radiative forcing required to achieve those temperature goals, such as those set in the Paris agreement.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
Eike, Liv-Marie; Mauseth, Brynjar; Camilio, Ketil André; Rekdal, Øystein; Sveinbjørnsson, Baldur
2016-01-01
In the present study we examined the ability of the amino acid derivative LTX-401 to induce cell death in cancer cell lines, as well as the capacity to induce regression in a murine melanoma model. Mode of action studies in vitro revealed lytic cell death and release of danger-associated molecular pattern molecules, preceded by massive cytoplasmic vacuolization and compromised lysosomes in treated cells. The use of a murine melanoma model demonstrated that the majority of animals treated with intratumoural injections of LTX-401 showed complete and long-lasting remission. Taken together, these results demonstrate the potential of LTX-401 as an immunotherapeutic agent for the treatment of solid tumors. PMID:26881822
Three-parameter modeling of the soil sorption of acetanilide and triazine herbicide derivatives.
Freitas, Mirlaine R; Matias, Stella V B G; Macedo, Renato L G; Freitas, Matheus P; Venturin, Nelson
2014-02-01
Herbicides have widely variable toxicity and many of them are persistent soil contaminants. Acetanilide and triazine family of herbicides have widespread use, but increasing interest for the development of new herbicides has been rising to increase their effectiveness and to diminish environmental hazard. The environmental risk of new herbicides can be accessed by estimating their soil sorption (logKoc), which is usually correlated to the octanol/water partition coefficient (logKow). However, earlier findings have shown that this correlation is not valid for some acetanilide and triazine herbicides. Thus, easily accessible quantitative structure-property relationship models are required to predict logKoc of analogues of the these compounds. Octanol/water partition coefficient, molecular weight and volume were calculated and then regressed against logKoc for two series of acetanilide and triazine herbicides using multiple linear regression, resulting in predictive and validated models.
Yun, Ruijuan; Lin, Chung-Chih; Wu, Shuicai; Huang, Chu-Chung; Lin, Ching-Po; Chao, Yi-Ping
2013-01-01
In this study, we employed diffusion tensor imaging (DTI) to construct brain structural network and then derive the connection matrices from 96 healthy elderly subjects. The correlation analysis between these topological properties of network based on graph theory and the Cognitive Abilities Screening Instrument (CASI) index were processed to extract the significant network characteristics. These characteristics were then integrated to estimate the models by various machine-learning algorithms to predict user's cognitive performance. From the results, linear regression model and Gaussian processes model showed presented better abilities with lower mean absolute errors of 5.8120 and 6.25 to predict the cognitive performance respectively. Moreover, these extracted topological properties of brain structural network derived from DTI also could be regarded as the bio-signatures for further evaluation of brain degeneration in healthy aged and early diagnosis of mild cognitive impairment (MCI).
Modelling rainfall interception by forests: a new method for estimating the canopy storage capacity
NASA Astrophysics Data System (ADS)
Pereira, Fernando; Valente, Fernanda; Nóbrega, Cristina
2015-04-01
Evaporation of rainfall intercepted by forests is usually an important part of a catchment water balance. Recognizing the importance of interception loss, several models of the process have been developed. A key parameter of these models is the canopy storage capacity (S), commonly estimated by the so-called Leyton method. However, this method is somewhat subjective in the selection of the storms used to derive S, which is particularly critical when throughfall is highly variable in space. To overcome these problems, a new method for estimating S was proposed in 2009 by Pereira et al. (Agricultural and Forest Meteorology, 149: 680-688), which uses information from a larger number of storms, is less sensitive to throughfall spatial variability and is consistent with the formulation of the two most widely used rainfall interception models, Gash analytical model and Rutter model. However, this method has a drawback: it does not account for stemflow (Sf). To allow a wider use of this methodology, we propose now a revised version which makes the estimation of S independent of the importance of stemflow. For the application of this new version we only need to establish a linear regression of throughfall vs. gross rainfall using data from all storms large enough to saturate the canopy. Two of the parameters used by the Gash and the Rutter models, pd (the drainage partitioning coefficient) and S, are then derived from the regression coefficients: pd is firstly estimated allowing then the derivation of S but, if Sf is not considered, S can be estimated making pd= 0. This new method was tested using data from a eucalyptus plantation, a maritime pine forest and a traditional olive grove, all located in Central Portugal. For both the eucalyptus and the pine forests pd and S estimated by this new approach were comparable to the values derived in previous studies using the standard procedures. In the case of the traditional olive grove, the estimates obtained by this methodology for pd and S allowed interception loss to be modelled with a normalized averaged error less than 4%. Globally, these results confirm that the method is more robust and certainly less subjective, providing adequate estimates for pd and S which, in turn, are crucial for a good performance of the interception models.
Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.
2017-01-01
Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868
Development and validation of a mortality risk model for pediatric sepsis.
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-05-01
Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.
Development and validation of a mortality risk model for pediatric sepsis
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-01-01
Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310
NASA Technical Reports Server (NTRS)
Whitmore, Stephen R.; Moes, Timothy R.
1991-01-01
The accuracy of a prototype nonintrusive airdata system derived for high-angle-of-attack measurements was demonstrated for quasi-steady maneuvers as great as 55 degrees during phase one of the F-18 high alpha research vehicle flight test program. This system consists of a matrix of nine pressure ports arranged in annular rings on the aircraft nose, and estimates the complete airdata set utilizing flow modeling and nonlinear regression. Particular attention is paid to the effects of acoustical distortions within the individual pressure sensors of the HI-FADS pressure matrix. A dynamic model to quantify these effects which describes acoustical distortion is developed and solved in closed form for frequency response.
NASA Astrophysics Data System (ADS)
El Masri, Bassil
2011-12-01
Modeling terrestrial ecosystem functions and structure has been a subject of increasing interest because of the importance of the terrestrial carbon cycle in global carbon budget and climate change. In this study, satellite data were used to estimate gross primary production (GPP), evapotranspiration (ET) for two deciduous forests: Morgan Monroe State forest (MMSF) in Indiana and Harvard forest in Massachusetts. Also, above-ground biomass (AGB) was estimated for the MMSF and the Howland forest (mixed forest) in Maine. Surface reflectance and temperature, vegetation indices, soil moisture, tree height and canopy area derived from the Moderate Resolution Imagining Spectroradiometer (MODIS), the Advanced Microwave Scanning Radiometer (AMRS-E), LIDAR, and aerial imagery respectively, were used for this purpose. These variables along with others derived from remotely sensed data were used as inputs variables to process-based models which estimated GPP and ET and to a regression model which estimated AGB. The process-based models were BIOME-BGC and the Penman-Monteith equation. Measured values for the carbon and water fluxes obtained from the Eddy covariance flux tower were compared to the modeled GPP and ET. The data driven methods produced good estimation of GPP and ET with an average root mean square error (RMSE) of 0.17 molC/m2 and 0.40 mm/day, respectively for the MMSF and the Harvard forest. In addition, allometric data for the MMSF were used to develop the regression model relating AGB with stem volume. The performance of the AGB regression model was compared to site measurements using remotely sensed data for the MMSF and the Howland forest where the model AGB RMSE ranged between 2.92--3.30 Kg C/m2. Sensitivity analysis revealed that improvement in maintenance respiration estimation and remotely sensed maximum photosynthetic activity as well as accurate estimate of canopy resistance will result in improved GPP and ET predictions. Moreover, AGB estimates were found to decrease as large grid size is used in rasterizing LIDAR return points. The analysis suggested that this methodology could be used as an operational procedure for monitoring changes in terrestrial ecosystem functions and structure brought by environmental changes.
NASA Astrophysics Data System (ADS)
Yoshida, Kenichiro; Nishidate, Izumi; Ojima, Nobutoshi; Iwata, Kayoko
2014-01-01
To quantitatively evaluate skin chromophores over a wide region of curved skin surface, we propose an approach that suppresses the effect of the shading-derived error in the reflectance on the estimation of chromophore concentrations, without sacrificing the accuracy of that estimation. In our method, we use multiple regression analysis, assuming the absorbance spectrum as the response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as the predictor variables. The concentrations of melanin and total hemoglobin are determined from the multiple regression coefficients using compensation formulae (CF) based on the diffuse reflectance spectra derived from a Monte Carlo simulation. To suppress the shading-derived error, we investigated three different combinations of multiple regression coefficients for the CF. In vivo measurements with the forearm skin demonstrated that the proposed approach can reduce the estimation errors that are due to shading-derived errors in the reflectance. With the best combination of multiple regression coefficients, we estimated that the ratio of the error to the chromophore concentrations is about 10%. The proposed method does not require any measurements or assumptions about the shape of the subjects; this is an advantage over other studies related to the reduction of shading-derived errors.
NASA Astrophysics Data System (ADS)
Chilenski, M. A.; Greenwald, M. J.; Hubbard, A. E.; Hughes, J. W.; Lee, J. P.; Marzouk, Y. M.; Rice, J. E.; White, A. E.
2017-12-01
It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al 2013 Phys. Plasmas 20 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al 2015 Plasma Phys. Control. Fusion 57 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression. This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of n{e} and T{e} between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.
Cashman, Kevin D.; Ritz, Christian; Kiely, Mairead
2017-01-01
Dietary Reference Values (DRVs) for vitamin D have a key role in the prevention of vitamin D deficiency. However, despite adopting similar risk assessment protocols, estimates from authoritative agencies over the last 6 years have been diverse. This may have arisen from diverse approaches to data analysis. Modelling strategies for pooling of individual subject data from cognate vitamin D randomized controlled trials (RCTs) are likely to provide the most appropriate DRV estimates. Thus, the objective of the present work was to undertake the first-ever individual participant data (IPD)-level meta-regression, which is increasingly recognized as best practice, from seven winter-based RCTs (with 882 participants ranging in age from 4 to 90 years) of the vitamin D intake–serum 25-hydroxyvitamin D (25(OH)D) dose-response. Our IPD-derived estimates of vitamin D intakes required to maintain 97.5% of 25(OH)D concentrations >25, 30, and 50 nmol/L across the population are 10, 13, and 26 µg/day, respectively. In contrast, standard meta-regression analyses with aggregate data (as used by several agencies in recent years) from the same RCTs estimated that a vitamin D intake requirement of 14 µg/day would maintain 97.5% of 25(OH)D >50 nmol/L. These first IPD-derived estimates offer improved dietary recommendations for vitamin D because the underpinning modeling captures the between-person variability in response of serum 25(OH)D to vitamin D intake. PMID:28481259
De Luca, Andrea; Flandre, Philippe; Dunn, David; Zazzi, Maurizio; Wensing, Annemarie; Santoro, Maria Mercedes; Günthard, Huldrych F; Wittkop, Linda; Kordossis, Theodoros; Garcia, Federico; Castagna, Antonella; Cozzi-Lepri, Alessandro; Churchill, Duncan; De Wit, Stéphane; Brockmeyer, Norbert H; Imaz, Arkaitz; Mussini, Cristina; Obel, Niels; Perno, Carlo Federico; Roca, Bernardino; Reiss, Peter; Schülter, Eugen; Torti, Carlo; van Sighem, Ard; Zangerle, Robert; Descamps, Diane
2016-05-01
The objective of this study was to improve the prediction of the impact of HIV-1 protease mutations in different viral subtypes on virological response to darunavir. Darunavir-containing treatment change episodes (TCEs) in patients previously failing PIs were selected from large European databases. HIV-1 subtype B-infected patients were used as the derivation dataset and HIV-1 non-B-infected patients were used as the validation dataset. The adjusted association of each mutation with week 8 HIV RNA change from baseline was analysed by linear regression. A prediction model was derived based on best subset least squares estimation with mutational weights corresponding to regression coefficients. Virological outcome prediction accuracy was compared with that from existing genotypic resistance interpretation systems (GISs) (ANRS 2013, Rega 9.1.0 and HIVdb 7.0). TCEs were selected from 681 subtype B-infected and 199 non-B-infected adults. Accompanying drugs were NRTIs in 87%, NNRTIs in 27% and raltegravir or maraviroc or enfuvirtide in 53%. The prediction model included weighted protease mutations, HIV RNA, CD4 and activity of accompanying drugs. The model's association with week 8 HIV RNA change in the subtype B (derivation) set was R(2) = 0.47 [average squared error (ASE) = 0.67, P < 10(-6)]; in the non-B (validation) set, ASE was 0.91. Accuracy investigated by means of area under the receiver operating characteristic curves with a binary response (above the threshold value of HIV RNA reduction) showed that our final model outperformed models with existing interpretation systems in both training and validation sets. A model with a new darunavir-weighted mutation score outperformed existing GISs in both B and non-B subtypes in predicting virological response to darunavir. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Wood, J. Luke; Harris, Frank, III
2015-01-01
The purpose of this study was to understand the relationship (if any) between college selection factors and persistence for Black and Latino males in the community college. Using data derived from the Educational Longitudinal Study, backwards stepwise logistic regression models were developed for both groups. Findings are contextualized in light…
1982-05-13
Size Of The Software. A favourite measure for software system size is linos of operational code, or deliverable code (operational code plus...regression models, these conversions are either derived from productivity measures using the "cost per instruction" type of equation or they are...appropriate to different development organisattons, differert project types, different sets of units for measuring e and s, and different items
ERIC Educational Resources Information Center
Van Norman, Ethan R.; Christ, Theodore J.; Zopluoglu, Cengiz
2013-01-01
This study examined the effect of baseline estimation on the quality of trend estimates derived from Curriculum Based Measurement of Oral Reading (CBM-R) progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for schedules ranging from 6-20 weeks for datasets with high and low…
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
MEASUREMENT OF WIND SPEED FROM COOLING LAKE THERMAL IMAGERY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, A; Robert Kurzeja, R; Eliel Villa-Aleman, E
2009-01-20
The Savannah River National Laboratory (SRNL) collected thermal imagery and ground truth data at two commercial power plant cooling lakes to investigate the applicability of laboratory empirical correlations between surface heat flux and wind speed, and statistics derived from thermal imagery. SRNL demonstrated in a previous paper [1] that a linear relationship exists between the standard deviation of image temperature and surface heat flux. In this paper, SRNL will show that the skewness of the temperature distribution derived from cooling lake thermal images correlates with instantaneous wind speed measured at the same location. SRNL collected thermal imagery, surface meteorology andmore » water temperatures from helicopters and boats at the Comanche Peak and H. B. Robinson nuclear power plant cooling lakes. SRNL found that decreasing skewness correlated with increasing wind speed, as was the case for the laboratory experiments. Simple linear and orthogonal regression models both explained about 50% of the variance in the skewness - wind speed plots. A nonlinear (logistic) regression model produced a better fit to the data, apparently because the thermal convection and resulting skewness are related to wind speed in a highly nonlinear way in nearly calm and in windy conditions.« less
Li, Ming Ze; Gao, Yuan Ke; Di, Xue Ying; Fan, Wen Yi
2016-03-01
The moisture content of forest surface soil is an important parameter in forest ecosystems. It is practically significant for forest ecosystem related research to use microwave remote sensing technology for rapid and accurate estimation of the moisture content of forest surface soil. With the aid of TDR-300 soil moisture content measuring instrument, the moisture contents of forest surface soils of 120 sample plots at Tahe Forestry Bureau of Daxing'anling region in Heilongjiang Province were measured. Taking the moisture content of forest surface soil as the dependent variable and the polarization decomposition parameters of C band Quad-pol SAR data as independent variables, two types of quantitative estimation models (multilinear regression model and BP-neural network model) for predicting moisture content of forest surface soils were developed. The spatial distribution of moisture content of forest surface soil on the regional scale was then derived with model inversion. Results showed that the model precision was 86.0% and 89.4% with RMSE of 3.0% and 2.7% for the multilinear regression model and the BP-neural network model, respectively. It indicated that the BP-neural network model had a better performance than the multilinear regression model in quantitative estimation of the moisture content of forest surface soil. The spatial distribution of forest surface soil moisture content in the study area was then obtained by using the BP neural network model simulation with the Quad-pol SAR data.
Appleton, J D; Cave, M R; Miles, J C H; Sumerling, T J
2011-03-01
Least squares (LS), Theil's (TS) and weighted total least squares (WTLS) regression analysis methods are used to develop empirical relationships between radium in the ground, radon in soil and radon in dwellings to assist in the post-closure assessment of indoor radon related to near-surface radioactive waste disposal at the Low Level Waste Repository in England. The data sets used are (i) estimated ²²⁶Ra in the < 2 mm fraction of topsoils (eRa226) derived from equivalent uranium (eU) from airborne gamma spectrometry data, (ii) eRa226 derived from measurements of uranium in soil geochemical samples, (iii) soil gas radon and (iv) indoor radon data. For models comparing indoor radon and (i) eRa226 derived from airborne eU data and (ii) soil gas radon data, some of the geological groupings have significant slopes. For these groupings there is reasonable agreement in slope and intercept between the three regression analysis methods (LS, TS and WTLS). Relationships between radon in dwellings and radium in the ground or radon in soil differ depending on the characteristics of the underlying geological units, with more permeable units having steeper slopes and higher indoor radon concentrations for a given radium or soil gas radon concentration in the ground. The regression models comparing indoor radon with soil gas radon have intercepts close to 5 Bq m⁻³ whilst the intercepts for those comparing indoor radon with eRa226 from airborne eU vary from about 20 Bq m⁻³ for a moderately permeable geological unit to about 40 Bq m⁻³ for highly permeable limestone, implying unrealistically high contributions to indoor radon from sources other than the ground. An intercept value of 5 Bq m⁻³ is assumed as an appropriate mean value for the UK for sources of indoor radon other than radon from the ground, based on examination of UK data. Comparison with published data used to derive an average indoor radon: soil ²²⁶Ra ratio shows that whereas the published data are generally clustered with no obvious correlation, the data from this study have substantially different relationships depending largely on the permeability of the underlying geology. Models for the relatively impermeable geological units plot parallel to the average indoor radon: soil ²²⁶Ra model but with lower indoor radon: soil ²²⁶Ra ratios, whilst the models for the permeable geological units plot parallel to the average indoor radon: soil ²²⁶Ra model but with higher than average indoor radon: soil ²²⁶Ra ratios. Copyright © 2010 Natural Environment Research Council. Published by Elsevier Ltd.. All rights reserved.
Bjerklie, David M.; Dingman, S. Lawrence; Bolster, Carl H.
2005-01-01
A set of conceptually derived in‐bank river discharge–estimating equations (models), based on the Manning and Chezy equations, are calibrated and validated using a database of 1037 discharge measurements in 103 rivers in the United States and New Zealand. The models are compared to a multiple regression model derived from the same data. The comparison demonstrates that in natural rivers, using an exponent on the slope variable of 0.33 rather than the traditional value of 0.5 reduces the variance associated with estimating flow resistance. Mean model uncertainty, assuming a constant value for the conductance coefficient, is less than 5% for a large number of estimates, and 67% of the estimates would be accurate within 50%. The models have potential application where site‐specific flow resistance information is not available and can be the basis for (1) a general approach to estimating discharge from remotely sensed hydraulic data, (2) comparison to slope‐area discharge estimates, and (3) large‐scale river modeling.
NASA Technical Reports Server (NTRS)
DeLoach, Richard
2012-01-01
This paper reviews the derivation of an equation for scaling response surface modeling experiments. The equation represents the smallest number of data points required to fit a linear regression polynomial so as to achieve certain specified model adequacy criteria. Specific criteria are proposed which simplify an otherwise rather complex equation, generating a practical rule of thumb for the minimum volume of data required to adequately fit a polynomial with a specified number of terms in the model. This equation and the simplified rule of thumb it produces can be applied to minimize the cost of wind tunnel testing.
Tiyip, Tashpolat; Ding, Jianli; Zhang, Dong; Liu, Wei; Wang, Fei; Tashpolat, Nigara
2017-01-01
Effective pretreatment of spectral reflectance is vital to model accuracy in soil parameter estimation. However, the classic integer derivative has some disadvantages, including spectral information loss and the introduction of high-frequency noise. In this paper, the fractional order derivative algorithm was applied to the pretreatment and partial least squares regression (PLSR) was used to assess the clay content of desert soils. Overall, 103 soil samples were collected from the Ebinur Lake basin in the Xinjiang Uighur Autonomous Region of China, and used as data sets for calibration and validation. Following laboratory measurements of spectral reflectance and clay content, the raw spectral reflectance and absorbance data were treated using the fractional derivative order from the 0.0 to the 2.0 order (order interval: 0.2). The ratio of performance to deviation (RPD), determinant coefficients of calibration (Rc2), root mean square errors of calibration (RMSEC), determinant coefficients of prediction (Rp2), and root mean square errors of prediction (RMSEP) were applied to assess the performance of predicting models. The results showed that models built on the fractional derivative order performed better than when using the classic integer derivative. Comparison of the predictive effects of 22 models for estimating clay content, calibrated by PLSR, showed that those models based on the fractional derivative 1.8 order of spectral reflectance (Rc2 = 0.907, RMSEC = 0.425%, Rp2 = 0.916, RMSEP = 0.364%, and RPD = 2.484 ≥ 2.000) and absorbance (Rc2 = 0.888, RMSEC = 0.446%, Rp2 = 0.918, RMSEP = 0.383% and RPD = 2.511 ≥ 2.000) were most effective. Furthermore, they performed well in quantitative estimations of the clay content of soils in the study area. PMID:28934274
Proxies for soil organic carbon derived from remote sensing
NASA Astrophysics Data System (ADS)
Rasel, S. M. M.; Groen, T. A.; Hussin, Y. A.; Diti, I. J.
2017-07-01
The possibility of carbon storage in soils is of interest because compared to vegetation it contains more carbon. Estimation of soil carbon through remote sensing based techniques can be a cost effective approach, but is limited by available methods. This study aims to develop a model based on remotely sensed variables (elevation, forest type and above ground biomass) to estimate soil carbon stocks. Field observations on soil organic carbon, species composition, and above ground biomass were recorded in the subtropical forest of Chitwan, Nepal. These variables were also estimated using LiDAR data and a WorldView 2 image. Above ground biomass was estimated from the LiDAR image using a novel approach where the image was segmented to identify individual trees, and for these trees estimates of DBH and Height were made. Based on AIC (Akaike Information Criterion) a regression model with above ground biomass derived from LiDAR data, and forest type derived from WorldView 2 imagery was selected to estimate soil organic carbon (SOC) stocks. The selected model had a coefficient of determination (R2) of 0.69. This shows the scope of estimating SOC with remote sensing derived variables in sub-tropical forests.
Esteki, M; Nouroozi, S; Shahsavari, Z
2016-02-01
To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 μg mL(-1) for MP, and from 0.1 to 25 μg mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was carried out with satisfactory results (84-112%). The proposed method, which is an environmentally friendly approach, using minimum amount of solvent, is a simple, fast and low-cost analysis method that can provide high accuracy and robust models. The suggested method does not need any complex extraction procedure which is time-consuming and requires hazardous solvents. © 2015 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Granger causality--statistical analysis under a configural perspective.
von Eye, Alexander; Wiedermann, Wolfgang; Mun, Eun-Young
2014-03-01
The concept of Granger causality can be used to examine putative causal relations between two series of scores. Based on regression models, it is asked whether one series can be considered the cause for the second series. In this article, we propose extending the pool of methods available for testing hypotheses that are compatible with Granger causation by adopting a configural perspective. This perspective allows researchers to assume that effects exist for specific categories only or for specific sectors of the data space, but not for other categories or sectors. Configural Frequency Analysis (CFA) is proposed as the method of analysis from a configural perspective. CFA base models are derived for the exploratory analysis of Granger causation. These models are specified so that they parallel the regression models used for variable-oriented analysis of hypotheses of Granger causation. An example from the development of aggression in adolescence is used. The example shows that only one pattern of change in aggressive impulses over time Granger-causes change in physical aggression against peers.
Modeling the prediction of business intelligence system effectiveness.
Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I
2016-01-01
Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.
Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M
2018-04-01
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.
Modeled summer background concentration nutrients and ...
We used regression models to predict background concentration of four water quality indictors: total nitrogen (N), total phosphorus (P), chloride, and total suspended solids (TSS), in the mid-continent (USA) great rivers, the Upper Mississippi, the Lower Missouri, and the Ohio. From best-model linear regressions of water quality indicators with land use and other stressor variables, we determined the concentration of the indicators when the land use and stressor variables were all set to zero the y-intercept. Except for total P on the Upper Mississippi River and chloride on the Ohio River, we were able to predict background concentration from significant regression models. In every model with more than one predictor variable, the model included at least one variable representing agricultural land use and one variable representing development. Predicted background concentration of total N was the same on the Upper Mississippi and Lower Missouri rivers (350 ug l-1), which was much lower than a published eutrophication threshold and percentile-based thresholds (25th percentile of concentration at all sites in the population) but was similar to a threshold derived from the response of sestonic chlorophyll a to great river total N concentration. Background concentration of total P on the Lower Missouri (53 ug l-1) was also lower than published and percentile-based thresholds. Background TSS concentration was higher on the Lower Missouri (30 mg l-1) than the other ri
NASA Astrophysics Data System (ADS)
Shi, Liangliang; Mao, Zhihua; Wang, Zheng
2018-02-01
Satellite imagery has played an important role in monitoring water quality of lakes or coastal waters presently, but scarcely been applied in inland rivers. This paper presents an attempt of feasibility to apply regression model to quantify and map the concentrations of total suspended matter (CTSM) in inland rivers which have a large scale of spatial and a high CTSM dynamic range by using high resolution satellite remote sensing data, WorldView-2. An empirical approach to quantify CTSM by integrated use of high resolution WorldView-2 multispectral data and 21 in situ CTSM measurements. Radiometric correction, geometric and atmospheric correction involved in image processing procedure is carried out for deriving the surface reflectance to correlate the CTSM and satellite data by using single-variable and multivariable regression technique. Results of regression model show that the single near-infrared (NIR) band 8 of WorldView-2 have a relative strong relationship (R2=0.93) with CTSM. Different prediction models were developed on various combinations of WorldView-2 bands, the Akaike Information Criteria approach was used to choose the best model. The model involving band 1, 3, 5, and 8 of WorldView-2 had a best performance, whose R2 reach to 0.92, with SEE of 53.30 g/m3. The spatial distribution maps were produced by using the best multiple regression model. The results of this paper indicated that it is feasible to apply the empirical model by using high resolution satellite imagery to retrieve CTSM of inland rivers in routine monitoring of water quality.
Characterizing multivariate decoding models based on correlated EEG spectral features.
McFarland, Dennis J
2013-07-01
Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Domain-Invariant Partial-Least-Squares Regression.
Nikzad-Langerodi, Ramin; Zellinger, Werner; Lughofer, Edwin; Saminger-Platz, Susanne
2018-05-11
Multivariate calibration models often fail to extrapolate beyond the calibration samples because of changes associated with the instrumental response, environmental condition, or sample matrix. Most of the current methods used to adapt a source calibration model to a target domain exclusively apply to calibration transfer between similar analytical devices, while generic methods for calibration-model adaptation are largely missing. To fill this gap, we here introduce domain-invariant partial-least-squares (di-PLS) regression, which extends ordinary PLS by a domain regularizer in order to align the source and target distributions in the latent-variable space. We show that a domain-invariant weight vector can be derived in closed form, which allows the integration of (partially) labeled data from the source and target domains as well as entirely unlabeled data from the latter. We test our approach on a simulated data set where the aim is to desensitize a source calibration model to an unknown interfering agent in the target domain (i.e., unsupervised model adaptation). In addition, we demonstrate unsupervised, semisupervised, and supervised model adaptation by di-PLS on two real-world near-infrared (NIR) spectroscopic data sets.
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Connecting clinical and actuarial prediction with rule-based methods.
Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H
2015-06-01
Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
Omnibus risk assessment via accelerated failure time kernel machine modeling.
Sinnott, Jennifer A; Cai, Tianxi
2013-12-01
Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. © 2013, The International Biometric Society.
Predicting musically induced emotions from physiological inputs: linear and neural network models.
Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M
2013-01-01
Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.
Samoli, Evangelia; Butland, Barbara K
2017-12-01
Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.
NASA Astrophysics Data System (ADS)
Bhattarai, Nishan; Wagle, Pradeep; Gowda, Prasanna H.; Kakani, Vijaya G.
2017-11-01
The ability of remote sensing-based surface energy balance (SEB) models to track water stress in rain-fed switchgrass (Panicum virgatum L.) has not been explored yet. In this paper, the theoretical framework of crop water stress index (CWSI; 0 = extremely wet or no water stress condition and 1 = extremely dry or no transpiration) was utilized to estimate CWSI in rain-fed switchgrass using Landsat-derived evapotranspiration (ET) from five remote sensing based single-source SEB models, namely Surface Energy Balance Algorithm for Land (SEBAL), Mapping ET with Internalized Calibration (METRIC), Surface Energy Balance System (SEBS), Simplified Surface Energy Balance Index (S-SEBI), and Operational Simplified Surface Energy Balance (SSEBop). CWSI estimates from the five SEB models and a simple regression model that used normalized difference vegetation index (NDVI), near-surface temperature difference, and measured soil moisture (SM) as covariates were compared with those derived from eddy covariance measured ET (CWSIEC) for the 32 Landsat image acquisition dates during the 2011 (dry) and 2013 (wet) growing seasons. Results indicate that most SEB models can predict CWSI reasonably well. For example, the root mean square error (RMSE) ranged from 0.14 (SEBAL) to 0.29 (SSEBop) and the coefficient of determination (R2) ranged from 0.25 (SSEBop) to 0.72 (SEBAL), justifying the added complexity in CWSI modeling as compared to results from the simple regression model (R2 = 0.55, RMSE = 0.16). All SEB models underestimated CWSI in the dry year but the estimates from SEBAL and S-SEBI were within 7% of the mean CWSIEC and explained over 60% of variations in CWSIEC. In the wet year, S-SEBI mostly overestimated CWSI (around 28%), while estimates from METRIC, SEBAL, SEBS, and SSEBop were within 8% of the mean CWSIEC. Overall, SEBAL was the most robust model under all conditions followed by METRIC, whose performance was slightly worse and better than SEBAL in dry and wet years, respectively. Underestimation of CWSI under extremely dry soil conditions and the substantial role of SM in the regression model suggest that integration of SM in SEB models could improve their performances under dry conditions. These insights will provide useful guidance on the broader applicability of SEB models for mapping water stresses in switchgrass under varying geographical and meteorological conditions.
Lee, Sang Ho; Hayano, Koichi; Zhu, Andrew X.; Sahani, Dushyant V.; Yoshida, Hiroyuki
2015-01-01
Background To find prognostic biomarkers in pretreatment dynamic contrast-enhanced MRI (DCE-MRI) water-exchange-modified (WX) kinetic parameters for advanced hepatocellular carcinoma (HCC) treated with antiangiogenic monotherapy. Methods Twenty patients with advanced HCC underwent DCE-MRI and were subsequently treated with sunitinib. Pretreatment DCE-MRI data on advanced HCC were analyzed using five different WX kinetic models: the Tofts-Kety (WX-TK), extended TK (WX-ETK), two compartment exchange, adiabatic approximation to tissue homogeneity (WX-AATH), and distributed parameter (WX-DP) models. The total hepatic blood flow, arterial flow fraction (γ), arterial blood flow (BF A), portal blood flow, blood volume, mean transit time, permeability-surface area product, fractional interstitial volume (v I), extraction fraction, mean intracellular water molecule lifetime (τ C), and fractional intracellular volume (v C) were calculated. After receiver operating characteristic analysis with leave-one-out cross-validation, individual parameters for each model were assessed in terms of 1-year-survival (1YS) discrimination using Kaplan-Meier analysis, and association with overall survival (OS) using univariate Cox regression analysis with permutation testing. Results The WX-TK-model-derived γ (P = 0.022) and v I (P = 0.010), and WX-ETK-model-derived τ C (P = 0.023) and v C (P = 0.042) were statistically significant prognostic biomarkers for 1YS. Increase in the WX-DP-model-derived BF A (P = 0.025) and decrease in the WX-TK, WX-ETK, WX-AATH, and WX-DP-model-derived v C (P = 0.034, P = 0.038, P = 0.028, P = 0.041, respectively) were significantly associated with an increase in OS. Conclusions The WX-ETK-model-derived v C was an effective prognostic biomarker for advanced HCC treated with sunitinib. PMID:26366997
Zhao, Hui; Hua, Ye; Dai, Tu; He, Jian; Tang, Min; Fu, Xu; Mao, Liang; Jin, Huihan; Qiu, Yudong
2017-03-01
Microvascular invasion (MVI) in patients with hepatocellular carcinoma (HCC) cannot be accurately predicted preoperatively. This study aimed to establish a predictive scoring model of MVI in solitary HCC patients without macroscopic vascular invasion. A total of 309 consecutive HCC patients who underwent curative hepatectomy were divided into the derivation (n=206) and validation cohort (n=103). A predictive scoring model of MVI was established according to the valuable predictors in the derivation cohort based on multivariate logistic regression analysis. The performance of the predictive model was evaluated in the derivation and validation cohorts. Preoperative imaging features on CECT, such as intratumoral arteries, non-nodular type of HCC and absence of radiological tumor capsule were independent predictors for MVI. The predictive scoring model was established according to the β coefficients of the 3 predictors. Area under receiver operating characteristic (AUROC) of the predictive scoring model was 0.872 (95% CI, 0.817-0.928) and 0.856 (95% CI, 0.771-0.940) in the derivation and validation cohorts. The positive and negative predictive values were 76.5% and 88.0% in the derivation cohort and 74.4% and 88.3% in the validation cohort. The performance of the model was similar between the patients with tumor size ≤5cm and >5cm in AUROC (P=0.910). The predictive scoring model based on intratumoral arteries, non-nodular type of HCC, and absence of the radiological tumor capsule on preoperative CECT is of great value in the prediction of MVI regardless of tumor size. Copyright © 2017 Elsevier B.V. All rights reserved.
The effects of short- and long-term air pollutants on plant phenology and leaf characteristics.
Jochner, Susanne; Markevych, Iana; Beck, Isabelle; Traidl-Hoffmann, Claudia; Heinrich, Joachim; Menzel, Annette
2015-11-01
Pollution adversely affects vegetation; however, its impact on phenology and leaf morphology is not satisfactorily understood yet. We analyzed associations between pollutants and phenological data of birch, hazel and horse chestnut in Munich (2010) along with the suitability of leaf morphological parameters of birch for monitoring air pollution using two datasets: cumulated atmospheric concentrations of nitrogen dioxide and ozone derived from passive sampling (short-term exposure) and pollutant information derived from Land Use Regression models (long-term exposure). Partial correlations and stepwise regressions revealed that increased ozone (birch, horse chestnut), NO2, NOx and PM levels (hazel) were significantly related to delays in phenology. Correlations were especially high when rural sites were excluded suggesting a better estimation of long-term within-city pollution. In situ measurements of foliar characteristics of birch were not suitable for bio-monitoring pollution. Inconsistencies between long- and short-term exposure effects suggest some caution when interpreting short-term data collected within field studies. Copyright © 2015 Elsevier Ltd. All rights reserved.
Score tests for independence in semiparametric competing risks models.
Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul
2009-12-01
A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
Large signal-to-noise ratio quantification in MLE for ARARMAX models
NASA Astrophysics Data System (ADS)
Zou, Yiqun; Tang, Xiafei
2014-06-01
It has been shown that closed-loop linear system identification by indirect method can be generally transferred to open-loop ARARMAX (AutoRegressive AutoRegressive Moving Average with eXogenous input) estimation. For such models, the gradient-related optimisation with large enough signal-to-noise ratio (SNR) can avoid the potential local convergence in maximum likelihood estimation. To ease the application of this condition, the threshold SNR needs to be quantified. In this paper, we build the amplitude coefficient which is an equivalence to the SNR and prove the finiteness of the threshold amplitude coefficient within the stability region. The quantification of threshold is achieved by the minimisation of an elaborately designed multi-variable cost function which unifies all the restrictions on the amplitude coefficient. The corresponding algorithm based on two sets of physically realisable system input-output data details the minimisation and also points out how to use the gradient-related method to estimate ARARMAX parameters when local minimum is present as the SNR is small. Then, the algorithm is tested on a theoretical AutoRegressive Moving Average with eXogenous input model for the derivation of the threshold and a gas turbine engine real system for model identification, respectively. Finally, the graphical validation of threshold on a two-dimensional plot is discussed.
Cost-sensitive AdaBoost algorithm for ordinal regression based on extreme learning machine.
Riccardi, Annalisa; Fernández-Navarro, Francisco; Carloni, Sante
2014-10-01
In this paper, the well known stagewise additive modeling using a multiclass exponential (SAMME) boosting algorithm is extended to address problems where there exists a natural order in the targets using a cost-sensitive approach. The proposed ensemble model uses an extreme learning machine (ELM) model as a base classifier (with the Gaussian kernel and the additional regularization parameter). The closed form of the derived weighted least squares problem is provided, and it is employed to estimate analytically the parameters connecting the hidden layer to the output layer at each iteration of the boosting algorithm. Compared to the state-of-the-art boosting algorithms, in particular those using ELM as base classifier, the suggested technique does not require the generation of a new training dataset at each iteration. The adoption of the weighted least squares formulation of the problem has been presented as an unbiased and alternative approach to the already existing ELM boosting techniques. Moreover, the addition of a cost model for weighting the patterns, according to the order of the targets, enables the classifier to tackle ordinal regression problems further. The proposed method has been validated by an experimental study by comparing it with already existing ensemble methods and ELM techniques for ordinal regression, showing competitive results.
NASA Astrophysics Data System (ADS)
Ishizaki, N. N.; Dairaku, K.; Ueno, G.
2016-12-01
We have developed a statistical downscaling method for estimating probabilistic climate projection using CMIP5 multi general circulation models (GCMs). A regression model was established so that the combination of weights of GCMs reflects the characteristics of the variation of observations at each grid point. Cross validations were conducted to select GCMs and to evaluate the regression model to avoid multicollinearity. By using spatially high resolution observation system, we conducted statistically downscaled probabilistic climate projections with 20-km horizontal grid spacing. Root mean squared errors for monthly mean air surface temperature and precipitation estimated by the regression method were the smallest compared with the results derived from a simple ensemble mean of GCMs and a cumulative distribution function based bias correction method. Projected changes in the mean temperature and precipitation were basically similar to those of the simple ensemble mean of GCMs. Mean precipitation was generally projected to increase associated with increased temperature and consequent increased moisture content in the air. Weakening of the winter monsoon may affect precipitation decrease in some areas. Temperature increase in excess of 4 K was expected in most areas of Japan in the end of 21st century under RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm would increase around the Pacific side during the summer and the Japan Sea side during the winter season. This probabilistic climate projection based on the statistical method can be expected to bring useful information to the impact studies and risk assessments.
Dudley, Robert W.
2015-12-03
The largest average errors of prediction are associated with regression equations for the lowest streamflows derived for months during which the lowest streamflows of the year occur (such as the 5 and 1 monthly percentiles for August and September). The regression equations have been derived on the basis of streamflow and basin characteristics data for unregulated, rural drainage basins without substantial streamflow or drainage modifications (for example, diversions and (or) regulation by dams or reservoirs, tile drainage, irrigation, channelization, and impervious paved surfaces), therefore using the equations for regulated or urbanized basins with substantial streamflow or drainage modifications will yield results of unknown error. Input basin characteristics derived using techniques or datasets other than those documented in this report or using values outside the ranges used to develop these regression equations also will yield results of unknown error.
Evaluation of regression-based 3-D shoulder rhythms.
Xu, Xu; Dickerson, Clark R; Lin, Jia-Hua; McGorry, Raymond W
2016-08-01
The movements of the humerus, the clavicle, and the scapula are not completely independent. The coupled pattern of movement of these bones is called the shoulder rhythm. To date, multiple studies have focused on providing regression-based 3-D shoulder rhythms, in which the orientations of the clavicle and the scapula are estimated by the orientation of the humerus. In this study, six existing regression-based shoulder rhythms were evaluated by an independent dataset in terms of their predictability. The datasets include the measured orientations of the humerus, the clavicle, and the scapula of 14 participants over 118 different upper arm postures. The predicted orientations of the clavicle and the scapula were derived from applying those regression-based shoulder rhythms to the humerus orientation. The results indicated that none of those regression-based shoulder rhythms provides consistently more accurate results than the others. For all the joint angles and all the shoulder rhythms, the RMSE are all greater than 5°. Among those shoulder rhythms, the scapula lateral/medial rotation has the strongest correlation between the predicted and the measured angles, while the other thoracoclavicular and thoracoscapular bone orientation angles only showed a weak to moderate correlation. Since the regression-based shoulder rhythm has been adopted for shoulder biomechanical models to estimate shoulder muscle activities and structure loads, there needs to be further investigation on how the predicted error from the shoulder rhythm affects the output of the biomechanical model. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Farmer, W. H.; Kiang, J. E.
2017-12-01
The development, deployment and maintenance of water resources management infrastructure and practices rely on hydrologic characterization, which requires an understanding of local hydrology. With regards to streamflow, this understanding is typically quantified with statistics derived from long-term streamgage records. However, a fundamental problem is how to characterize local hydrology without the luxury of streamgage records, a problem that complicates water resources management at ungaged locations and for long-term future projections. This problem has typically been addressed through the development of point estimators, such as regression equations, to estimate particular statistics. Physically-based precipitation-runoff models, which are capable of producing simulated hydrographs, offer an alternative to point estimators. The advantage of simulated hydrographs is that they can be used to compute any number of streamflow statistics from a single source (the simulated hydrograph) rather than relying on a diverse set of point estimators. However, the use of simulated hydrographs introduces a degree of model uncertainty that is propagated through to estimated streamflow statistics and may have drastic effects on management decisions. We compare the accuracy and precision of streamflow statistics (e.g. the mean annual streamflow, the annual maximum streamflow exceeded in 10% of years, and the minimum seven-day average streamflow exceeded in 90% of years, among others) derived from point estimators (e.g. regressions, kriging, machine learning) to that of statistics derived from simulated hydrographs across the continental United States. Initial results suggest that the error introduced through hydrograph simulation may substantially bias the resulting hydrologic characterization.
External validation of preexisting first trimester preeclampsia prediction models.
Allen, Rebecca E; Zamora, Javier; Arroyo-Manzano, David; Velauthar, Luxmilar; Allotey, John; Thangaratinam, Shakila; Aquilina, Joseph
2017-10-01
To validate the increasing number of prognostic models being developed for preeclampsia using our own prospective study. A systematic review of literature that assessed biomarkers, uterine artery Doppler and maternal characteristics in the first trimester for the prediction of preeclampsia was performed and models selected based on predefined criteria. Validation was performed by applying the regression coefficients that were published in the different derivation studies to our cohort. We assessed the models discrimination ability and calibration. Twenty models were identified for validation. The discrimination ability observed in derivation studies (Area Under the Curves) ranged from 0.70 to 0.96 when these models were validated against the validation cohort, these AUC varied importantly, ranging from 0.504 to 0.833. Comparing Area Under the Curves obtained in the derivation study to those in the validation cohort we found statistically significant differences in several studies. There currently isn't a definitive prediction model with adequate ability to discriminate for preeclampsia, which performs as well when applied to a different population and can differentiate well between the highest and lowest risk groups within the tested population. The pre-existing large number of models limits the value of further model development and future research should be focussed on further attempts to validate existing models and assessing whether implementation of these improves patient care. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Vereecken, H; Vanderborght, J; Kasteel, R; Spiteller, M; Schäffer, A; Close, M
2011-01-01
In this study, we analyzed sorption parameters for pesticides that were derived from batch and column or batch and field experiments. The batch experiments analyzed in this study were run with the same pesticide and soil as in the column and field experiments. We analyzed the relationship between the pore water velocity of the column and field experiments, solute residence times, and sorption parameters, such as the organic carbon normalized distribution coefficient ( ) and the mass exchange coefficient in kinetic models, as well as the predictability of sorption parameters from basic soil properties. The batch/column analysis included 38 studies with a total of 139 observations. The batch/field analysis included five studies, resulting in a dataset of 24 observations. For the batch/column data, power law relationships between pore water velocity, residence time, and sorption constants were derived. The unexplained variability in these equations was reduced, taking into account the saturation status and the packing status (disturbed-undisturbed) of the soil sample. A new regression equation was derived that allows estimating the values derived from column experiments using organic matter and bulk density with an value of 0.56. Regression analysis of the batch/column data showed that the relationship between batch- and column-derived values depends on the saturation status and packing of the soil column. Analysis of the batch/field data showed that as the batch-derived value becomes larger, field-derived values tend to be lower than the corresponding batch-derived values, and vice versa. The present dataset also showed that the variability in the ratio of batch- to column-derived value increases with increasing pore water velocity, with a maximum value approaching 3.5. American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America.
Stature estimation from the lengths of the growing foot-a study on North Indian adolescents.
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam; DiMaggio, John A
2012-12-01
Stature estimation is considered as one of the basic parameters of the investigation process in unknown and commingled human remains in medico-legal case work. Race, age and sex are the other parameters which help in this process. Stature estimation is of the utmost importance as it completes the biological profile of a person along with the other three parameters of identification. The present research is intended to formulate standards for stature estimation from foot dimensions in adolescent males from North India and study the pattern of foot growth during the growing years. 154 male adolescents from the Northern part of India were included in the study. Besides stature, five anthropometric measurements that included the length of the foot from each toe (T1, T2, T3, T4, and T5 respectively) to pternion were measured on each foot. The data was analyzed statistically using Student's t-test, Pearson's correlation, linear and multiple regression analysis for estimation of stature and growth of foot during ages 13-18 years. Correlation coefficients between stature and all the foot measurements were found to be highly significant and positively correlated. Linear regression models and multiple regression models (with age as a co-variable) were derived for estimation of stature from the different measurements of the foot. Multiple regression models (with age as a co-variable) estimate stature with greater accuracy than the regression models for 13-18 years age group. The study shows the growth pattern of feet in North Indian adolescents and indicates that anthropometric measurements of the foot and its segments are valuable in estimation of stature in growing individuals of that population. Copyright © 2012 Elsevier Ltd. All rights reserved.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Atlas of climate change effects in 150 bird species of the Eastern United States
Stephen Matthews; Raymond O' Connor; Louis R. Iverson; Anantha M. Prasad
2004-01-01
NOTE: Instructions for navigating this publication can be found on the front cover. This atlas documents the current and potential future distribution of 150 common bird species in the Eastern United States. Distribution data for individual species were derived from the Breeding Bird Survey (BBS) from 1981 to 1990. Regression tree analysis was used to model the BBS...
Basal area increment and growth efficiency as functions of canopy dynamics and stem mechanics
Thomas J. Dean
2004-01-01
Crown and canopy structurecorrelate with growth efficiency and also determine stem size and taper as described by the uniform stress principle of stem formation. A regression model was derived from this principle that expresses basal area increment in terms of the amount and vertical distribution of leaf area and change in these variables during a growth period. This...
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Alexeeff, Stacey E; Carroll, Raymond J; Coull, Brent
2016-04-01
Spatial modeling of air pollution exposures is widespread in air pollution epidemiology research as a way to improve exposure assessment. However, there are key sources of exposure model uncertainty when air pollution is modeled, including estimation error and model misspecification. We examine the use of predicted air pollution levels in linear health effect models under a measurement error framework. For the prediction of air pollution exposures, we consider a universal Kriging framework, which may include land-use regression terms in the mean function and a spatial covariance structure for the residuals. We derive the bias induced by estimation error and by model misspecification in the exposure model, and we find that a misspecified exposure model can induce asymptotic bias in the effect estimate of air pollution on health. We propose a new spatial simulation extrapolation (SIMEX) procedure, and we demonstrate that the procedure has good performance in correcting this asymptotic bias. We illustrate spatial SIMEX in a study of air pollution and birthweight in Massachusetts. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Modeling of spectral signatures of littoral waters
NASA Astrophysics Data System (ADS)
Haltrin, Vladimir I.
1997-12-01
The spectral values of remotely obtained radiance reflectance coefficient (RRC) are compared with the values of RRC computed from inherent optical properties measured during the shipborne experiment near the West Florida coast. The model calculations are based on the algorithm developed at the Naval Research Laboratory at Stennis Space Center and presented here. The algorithm is based on the radiation transfer theory and uses regression relationships derived from experimental data. Overall comparison of derived and measured RRCs shows that this algorithm is suitable for processing ground truth data for the purposes of remote data calibration. The second part of this work consists of the evaluation of the predictive visibility model (PVM). The simulated three-dimensional values of optical properties are compared with the measured ones. Preliminary results of comparison are encouraging and show that the PVM can qualitatively predict the evolution of inherent optical properties in littoral waters.
Wu, Xiangxiang; Zeng, Huahui; Zhu, Xin; Ma, Qiujuan; Hou, Yimin; Wu, Xuefen
2013-11-20
A series of pyrrolopyridinone derivatives as specific inhibitors towards the cell division cycle 7 (Cdc7) was taken into account, and the efficacy of these compounds was analyzed by QSAR and docking approaches to gain deeper insights into the interaction mechanism and ligands selectivity for Cdc7. By regression analysis the prediction models based on Grid score and Zou-GB/SA score were found, respectively with good quality of fits (r(2)=0.748, 0.951; r(cv)(2)=0.712, 0.839). The accuracy of the models was validated by test set and the deviation of the predicted values in validation set using Zou-GB/SA score was smaller than that using Grid score, suggesting that the model based on Zou-GB/SA score provides a more effective method for predicting potencies of Cdc7 inhibitors. Copyright © 2013 Elsevier B.V. All rights reserved.
Safety evaluation model of urban cross-river tunnel based on driving simulation.
Ma, Yingqi; Lu, Linjun; Lu, Jian John
2017-09-01
Currently, Shanghai urban cross-river tunnels have three principal characteristics: increased traffic, a high accident rate and rapidly developing construction. Because of their complex geographic and hydrological characteristics, the alignment conditions in urban cross-river tunnels are more complicated than in highway tunnels, so a safety evaluation of urban cross-river tunnels is necessary to suggest follow-up construction and changes in operational management. A driving risk index (DRI) for urban cross-river tunnels was proposed in this study. An index system was also constructed, combining eight factors derived from the output of a driving simulator regarding three aspects of risk due to following, lateral accidents and driver workload. Analytic hierarchy process methods and expert marking and normalization processing were applied to construct a mathematical model for the DRI. The driving simulator was used to simulate 12 Shanghai urban cross-river tunnels and a relationship was obtained between the DRI for the tunnels and the corresponding accident rate (AR) via a regression analysis. The regression analysis results showed that the relationship between the DRI and the AR mapped to an exponential function with a high degree of fit. In the absence of detailed accident data, a safety evaluation model based on factors derived from a driving simulation can effectively assess the driving risk in urban cross-river tunnels constructed or in design.
Predictors of posttraumatic stress symptoms following childbirth
2014-01-01
Background Posttraumatic stress disorder (PTSD) following childbirth has gained growing attention in the recent years. Although a number of predictors for PTSD following childbirth have been identified (e.g., history of sexual trauma, emergency caesarean section, low social support), only very few studies have tested predictors derived from current theoretical models of the disorder. This study first aimed to replicate the association of PTSD symptoms after childbirth with predictors identified in earlier research. Second, cognitive predictors derived from Ehlers and Clark’s (2000) model of PTSD were examined. Methods N = 224 women who had recently given birth completed an online survey. In addition to computing single correlations between PTSD symptom severities and variables of interest, in a hierarchical multiple regression analyses posttraumatic stress symptoms were predicted by (1) prenatal variables, (2) birth-related variables, (3) postnatal social support, and (4) cognitive variables. Results Wellbeing during pregnancy and age were the only prenatal variables contributing significantly to the explanation of PTSD symptoms in the first step of the regression analysis. In the second step, the birth-related variables peritraumatic emotions and wellbeing during childbed significantly increased the explanation of variance. Despite showing significant bivariate correlations, social support entered in the third step did not predict PTSD symptom severities over and above the variables included in the first two steps. However, with the exception of peritraumatic dissociation all cognitive variables emerged as powerful predictors and increased the amount of variance explained from 43% to a total amount of 68%. Conclusions The findings suggest that the prediction of PTSD following childbirth can be improved by focusing on variables derived from a current theoretical model of the disorder. PMID:25026966
Wong, Man Sing; Peng, Fen; Zou, Bin; Shi, Wen Zhong; Wilson, Gaines J.
2016-01-01
Recent studies have suggested that some disadvantaged socio-demographic groups face serious environmental-related inequities in Hong Kong due to the rising ambient urban temperatures. Identifying heat-vulnerable groups and locating areas of Surface Urban Heat Island (SUHI) inequities is thus important for prioritizing interventions to mitigate death/illness rates from heat. This study addresses this problem by integrating methods of remote sensing retrieval, logistic regression modelling, and spatial autocorrelation. In this process, the SUHI effect was first estimated from the Land Surface Temperature (LST) derived from a Landsat image. With the scale assimilated to the SUHI and socio-demographic data, a logistic regression model was consequently adopted to ascertain their relationships based on Hong Kong Tertiary Planning Units (TPUs). Lastly, inequity “hotspots” were derived using spatial autocorrelation methods. Results show that disadvantaged socio-demographic groups were significantly more prone to be exposed to an intense SUHI effect: over half of 287 TPUs characterized by age groups of 60+ years, secondary and matriculation education attainment, widowed, divorced and separated, low and middle incomes, and certain occupation groups of workers, have significant Odds Ratios (ORs) larger than 1.2. It can be concluded that a clustering analysis stratified by age, income, educational attainment, marital status, and occupation is an effective way to detect the inequity hotspots of SUHI exposure. Additionally, inequities explored using income, marital status and occupation factors were more significant than the age and educational attainment in these areas. The derived maps and model can be further analyzed in urban/city planning, in order to mitigate the physical and social causes of the SUHI effect. PMID:26985899
Wong, Man Sing; Peng, Fen; Zou, Bin; Shi, Wen Zhong; Wilson, Gaines J
2016-03-12
Recent studies have suggested that some disadvantaged socio-demographic groups face serious environmental-related inequities in Hong Kong due to the rising ambient urban temperatures. Identifying heat-vulnerable groups and locating areas of Surface Urban Heat Island (SUHI) inequities is thus important for prioritizing interventions to mitigate death/illness rates from heat. This study addresses this problem by integrating methods of remote sensing retrieval, logistic regression modelling, and spatial autocorrelation. In this process, the SUHI effect was first estimated from the Land Surface Temperature (LST) derived from a Landsat image. With the scale assimilated to the SUHI and socio-demographic data, a logistic regression model was consequently adopted to ascertain their relationships based on Hong Kong Tertiary Planning Units (TPUs). Lastly, inequity "hotspots" were derived using spatial autocorrelation methods. Results show that disadvantaged socio-demographic groups were significantly more prone to be exposed to an intense SUHI effect: over half of 287 TPUs characterized by age groups of 60+ years, secondary and matriculation education attainment, widowed, divorced and separated, low and middle incomes, and certain occupation groups of workers, have significant Odds Ratios (ORs) larger than 1.2. It can be concluded that a clustering analysis stratified by age, income, educational attainment, marital status, and occupation is an effective way to detect the inequity hotspots of SUHI exposure. Additionally, inequities explored using income, marital status and occupation factors were more significant than the age and educational attainment in these areas. The derived maps and model can be further analyzed in urban/city planning, in order to mitigate the physical and social causes of the SUHI effect.
Uhrich, Mark A.; Spicer, Kurt R.; Mosbrucker, Adam; Christianson, Tami
2015-01-01
Regression of in-stream turbidity with concurrent sample-based suspended-sediment concentration (SSC) has become an accepted method for producing unit-value time series of inferred SSC (Rasmussen et al., 2009). Turbidity-SSC regression models are increasingly used to generate suspended-sediment records for Pacific Northwest rivers (e.g., Curran et al., 2014; Schenk and Bragg, 2014; Uhrich and Bragg, 2003). Recent work developing turbidity-SSC models for the North Fork Toutle River in Southwest Washington (Uhrich et al., 2014), as well as other studies (Landers and Sturm, 2013, Merten et al., 2014), suggests that models derived from annual or greater datasets may not adequately reflect shorter term changes in turbidity-SSC relations, warranting closer inspection of such relations. In-stream turbidity measurements and suspended-sediment samples have been collected from the North Fork Toutle River since 2010. The study site, U.S. Geological Survey (USGS) streamgage 14240525 near Kid Valley, Washington, is 13 river km downstream of the debris avalanche emplaced by the 1980 eruption of Mount St. Helens (Lipman and Mullineaux, 1981), and 2 river km downstream of the large sediment retention structure (SRS) built from 1987–1989 to mitigate the associated sediment hazard. The debris avalanche extends roughly 25 km down valley from the edifice of the volcano and is the primary source of suspended sediment moving past the streamgage (NF Toutle-SRS). Other significant sources are debris flow events and sand deposits upstream of the SRS, which are periodically remobilized and transported downstream. Also, finer material often is derived from the clay-rich original debris avalanche deposit, while coarser material can derive from areas such as fluvially reworked terraces.
Cozzi-Lepri, Alessandro; Prosperi, Mattia C F; Kjær, Jesper; Dunn, David; Paredes, Roger; Sabin, Caroline A; Lundgren, Jens D; Phillips, Andrew N; Pillay, Deenan
2011-01-01
The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered. We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34). We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research.
Development and validation of the Surgical Outcome Risk Tool (SORT)
Protopapa, K L; Simpson, J C; Smith, N C E; Moonesinghe, S R
2014-01-01
Background Existing risk stratification tools have limitations and clinical experience suggests they are not used routinely. The aim of this study was to develop and validate a preoperative risk stratification tool to predict 30-day mortality after non-cardiac surgery in adults by analysis of data from the observational National Confidential Enquiry into Patient Outcome and Death (NCEPOD) Knowing the Risk study. Methods The data set was split into derivation and validation cohorts. Logistic regression was used to construct a model in the derivation cohort to create the Surgical Outcome Risk Tool (SORT), which was tested in the validation cohort. Results Prospective data for 19 097 cases in 326 hospitals were obtained from the NCEPOD study. Following exclusion of 2309, details of 16 788 patients were analysed (derivation cohort 11 219, validation cohort 5569). A model of 45 risk factors was refined on repeated regression analyses to develop a model comprising six variables: American Society of Anesthesiologists Physical Status (ASA-PS) grade, urgency of surgery (expedited, urgent, immediate), high-risk surgical specialty (gastrointestinal, thoracic, vascular), surgical severity (from minor to complex major), cancer and age 65 years or over. In the validation cohort, the SORT was well calibrated and demonstrated better discrimination than the ASA-PS and Surgical Risk Scale; areas under the receiver operating characteristic (ROC) curve were 0·91 (95 per cent c.i. 0·88 to 0·94), 0·87 (0·84 to 0·91) and 0·88 (0·84 to 0·92) respectively (P < 0·001). Conclusion The SORT allows rapid and simple data entry of six preoperative variables, and provides a percentage mortality risk for individuals undergoing surgery. PMID:25388883
Predicting Energy Performance of a Net-Zero Energy Building: A Statistical Approach
Kneifel, Joshua; Webb, David
2016-01-01
Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid climate zone, and compares these estimates to the results from already existing EnergyPlus whole building energy simulations. This regression model exhibits agreement with EnergyPlus predictive trends in energy production and net consumption, but differs greatly in energy consumption. The model can be used as a framework for alternative and more complex models based on the experimental data collected from the NZERTF. PMID:27956756
Predicting Energy Performance of a Net-Zero Energy Building: A Statistical Approach.
Kneifel, Joshua; Webb, David
2016-09-01
Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid climate zone, and compares these estimates to the results from already existing EnergyPlus whole building energy simulations. This regression model exhibits agreement with EnergyPlus predictive trends in energy production and net consumption, but differs greatly in energy consumption. The model can be used as a framework for alternative and more complex models based on the experimental data collected from the NZERTF.
Forecasting space weather: Can new econometric methods improve accuracy?
NASA Astrophysics Data System (ADS)
Reikard, Gordon
2011-06-01
Space weather forecasts are currently used in areas ranging from navigation and communication to electric power system operations. The relevant forecast horizons can range from as little as 24 h to several days. This paper analyzes the predictability of two major space weather measures using new time series methods, many of them derived from econometrics. The data sets are the A p geomagnetic index and the solar radio flux at 10.7 cm. The methods tested include nonlinear regressions, neural networks, frequency domain algorithms, GARCH models (which utilize the residual variance), state transition models, and models that combine elements of several techniques. While combined models are complex, they can be programmed using modern statistical software. The data frequency is daily, and forecasting experiments are run over horizons ranging from 1 to 7 days. Two major conclusions stand out. First, the frequency domain method forecasts the A p index more accurately than any time domain model, including both regressions and neural networks. This finding is very robust, and holds for all forecast horizons. Combining the frequency domain method with other techniques yields a further small improvement in accuracy. Second, the neural network forecasts the solar flux more accurately than any other method, although at short horizons (2 days or less) the regression and net yield similar results. The neural net does best when it includes measures of the long-term component in the data.
Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E
2017-06-14
Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
Boshoff, Magdalena; De Jonge, Maarten; Scheifler, Renaud; Bervoets, Lieven
2014-09-15
The aim of this study was to derive regression-based soil-plant models to predict and compare metal(loid) (i.e. As, Cd, Cu, Pb and Zn) concentrations in plants (grass Agrostis sp./Poa sp. and nettle Urtica dioica L.) among sites with a wide range of metal pollution and a wide variation in soil properties. Regression models were based on the pseudo total (aqua-regia) and exchangeable (0.01 M CaCl2) soil metal concentrations. Plant metal concentrations were best explained by the pseudo total soil metal concentrations in combination with soil properties. The most important soil property that influenced U. dioica metal concentrations was the clay content, while for grass organic matter (OM) and pH affected the As (OM) and Cu and Zn (pH). In this study multiple linear regression models proved functional in predicting metal accumulation in plants on a regional scale. With the proposed models based on the pseudo total metal concentration, the percentage of variation explained for the metals As, Cd, Cu, Pb and Zn were 0.56%, 0.47%, 0.59%, 0.61%, 0.30% in nettle and 0.46%, 0.38%, 0.27%, 0.50%, 0.28% in grass. Copyright © 2014 Elsevier B.V. All rights reserved.
Imaging genetics approach to predict progression of Parkinson's diseases.
Mansu Kim; Seong-Jin Son; Hyunjin Park
2017-07-01
Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Atmospheric mold spore counts in relation to meteorological parameters
NASA Astrophysics Data System (ADS)
Katial, R. K.; Zhang, Yiming; Jones, Richard H.; Dyer, Philip D.
Fungal spore counts of Cladosporium, Alternaria, and Epicoccum were studied during 8 years in Denver, Colorado. Fungal spore counts were obtained daily during the pollinating season by a Rotorod sampler. Weather data were obtained from the National Climatic Data Center. Daily averages of temperature, relative humidity, daily precipitation, barometric pressure, and wind speed were studied. A time series analysis was performed on the data to mathematically model the spore counts in relation to weather parameters. Using SAS PROC ARIMA software, a regression analysis was performed, regressing the spore counts on the weather variables assuming an autoregressive moving average (ARMA) error structure. Cladosporium was found to be positively correlated (P<0.02) with average daily temperature, relative humidity, and negatively correlated with precipitation. Alternaria and Epicoccum did not show increased predictability with weather variables. A mathematical model was derived for Cladosporium spore counts using the annual seasonal cycle and significant weather variables. The model for Alternaria and Epicoccum incorporated the annual seasonal cycle. Fungal spore counts can be modeled by time series analysis and related to meteorological parameters controlling for seasonallity; this modeling can provide estimates of exposure to fungal aeroallergens.
NASA Astrophysics Data System (ADS)
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
The validation of a human force model to predict dynamic forces resulting from multi-joint motions
NASA Technical Reports Server (NTRS)
Pandya, Abhilash K.; Maida, James C.; Aldridge, Ann M.; Hasson, Scott M.; Woolford, Barbara J.
1992-01-01
The development and validation is examined of a dynamic strength model for humans. This model is based on empirical data. The shoulder, elbow, and wrist joints were characterized in terms of maximum isolated torque, or position and velocity, in all rotational planes. This data was reduced by a least squares regression technique into a table of single variable second degree polynomial equations determining torque as a function of position and velocity. The isolated joint torque equations were then used to compute forces resulting from a composite motion, in this case, a ratchet wrench push and pull operation. A comparison of the predicted results of the model with the actual measured values for the composite motion indicates that forces derived from a composite motion of joints (ratcheting) can be predicted from isolated joint measures. Calculated T values comparing model versus measured values for 14 subjects were well within the statistically acceptable limits and regression analysis revealed coefficient of variation between actual and measured to be within 0.72 and 0.80.
Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby
2003-01-01
In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.
NASA Astrophysics Data System (ADS)
Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.
2005-05-01
Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver
2007-01-01
We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619
Age estimation using pulp/tooth area ratio in maxillary canines-A digital image analysis.
Juneja, Manjushree; Devi, Yashoda B K; Rakesh, N; Juneja, Saurabh
2014-09-01
Determination of age of a subject is one of the most important aspects of medico-legal cases and anthropological research. Radiographs can be used to indirectly measure the rate of secondary dentine deposition which is depicted by reduction in the pulp area. In this study, 200 patients of Karnataka aged between 18-72 years were selected for the study. Panoramic radiographs were made and indirectly digitized. Radiographic images of maxillary canines (RIC) were processed using a computer-aided drafting program (ImageJ). The variables pulp/root length (p), pulp/tooth length (r), pulp/root width at enamel-cementum junction (ECJ) level (a), pulp/root width at mid-root level (c), pulp/root width at midpoint level between ECJ level and mid-root level (b) and pulp/tooth area ratio (AR) were recorded. All the morphological variables including gender were statistically analyzed to derive regression equation for estimation of age. It was observed that 2 variables 'AR' and 'b' contributed significantly to the fit and were included in the regression model, yielding the formula: Age = 87.305-480.455(AR)+48.108(b). Statistical analysis indicated that the regression equation with selected variables explained 96% of total variance with the median of the residuals of 0.1614 years and standard error of estimate of 3.0186 years. There is significant correlation between age and morphological variables 'AR' and 'b' and the derived population specific regression equation can be potentially used for estimation of chronological age of individuals of Karnataka origin.
Comparison of Conventional and ANN Models for River Flow Forecasting
NASA Astrophysics Data System (ADS)
Jain, A.; Ganti, R.
2011-12-01
Hydrological models are useful in many water resources applications such as flood control, irrigation and drainage, hydro power generation, water supply, erosion and sediment control, etc. Estimates of runoff are needed in many water resources planning, design development, operation and maintenance activities. River flow is generally estimated using time series or rainfall-runoff models. Recently, soft artificial intelligence tools such as Artificial Neural Networks (ANNs) have become popular for research purposes but have not been extensively adopted in operational hydrological forecasts. There is a strong need to develop ANN models based on real catchment data and compare them with the conventional models. In this paper, a comparative study has been carried out for river flow forecasting using the conventional and ANN models. Among the conventional models, multiple linear, and non linear regression, and time series models of auto regressive (AR) type have been developed. Feed forward neural network model structure trained using the back propagation algorithm, a gradient search method, was adopted. The daily river flow data derived from Godavari Basin @ Polavaram, Andhra Pradesh, India have been employed to develop all the models included here. Two inputs, flows at two past time steps, (Q(t-1) and Q(t-2)) were selected using partial auto correlation analysis for forecasting flow at time t, Q(t). A wide range of error statistics have been used to evaluate the performance of all the models developed in this study. It has been found that the regression and AR models performed comparably, and the ANN model performed the best amongst all the models investigated in this study. It is concluded that ANN model should be adopted in real catchments for hydrological modeling and forecasting.
Development of Ensemble Model Based Water Demand Forecasting Model
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop
2014-05-01
In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)
NASA Astrophysics Data System (ADS)
Nawar, Said; Buddenbaum, Henning; Hill, Joachim
2014-05-01
A rapid and inexpensive soil analytical technique is needed for soil quality assessment and accurate mapping. This study investigated a method for improved estimation of soil clay (SC) and organic matter (OM) using reflectance spectroscopy. Seventy soil samples were collected from Sinai peninsula in Egypt to estimate the soil clay and organic matter relative to the soil spectra. Soil samples were scanned with an Analytical Spectral Devices (ASD) spectrometer (350-2500 nm). Three spectral formats were used in the calibration models derived from the spectra and the soil properties: (1) original reflectance spectra (OR), (2) first-derivative spectra smoothened using the Savitzky-Golay technique (FD-SG) and (3) continuum-removed reflectance (CR). Partial least-squares regression (PLSR) models using the CR of the 400-2500 nm spectral region resulted in R2 = 0.76 and 0.57, and RPD = 2.1 and 1.5 for estimating SC and OM, respectively, indicating better performance than that obtained using OR and SG. The multivariate adaptive regression splines (MARS) calibration model with the CR spectra resulted in an improved performance (R2 = 0.89 and 0.83, RPD = 3.1 and 2.4) for estimating SC and OM, respectively. The results show that the MARS models have a great potential for estimating SC and OM compared with PLSR models. The results obtained in this study have potential value in the field of soil spectroscopy because they can be applied directly to the mapping of soil properties using remote sensing imagery in arid environment conditions. Key Words: soil clay, organic matter, PLSR, MARS, reflectance spectroscopy.
Yuan, Jintao; Yu, Shuling; Zhang, Ting; Yuan, Xuejie; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-06-01
Octanol/water (K(OW)) and octanol/air (K(OA)) partition coefficients are two important physicochemical properties of organic substances. In current practice, K(OW) and K(OA) values of some polychlorinated biphenyls (PCBs) are measured using generator column method. Quantitative structure-property relationship (QSPR) models can serve as a valuable alternative method of replacing or reducing experimental steps in the determination of K(OW) and K(OA). In this paper, two different methods, i.e., multiple linear regression based on dragon descriptors and hologram quantitative structure-activity relationship, were used to predict generator-column-derived log K(OW) and log K(OA) values of PCBs. The predictive ability of the developed models was validated using a test set, and the performances of all generated models were compared with those of three previously reported models. All results indicated that the proposed models were robust and satisfactory and can thus be used as alternative models for the rapid assessment of the K(OW) and K(OA) of PCBs. Copyright © 2016 Elsevier Inc. All rights reserved.
QSPR using MOLGEN-QSPR: the challenge of fluoroalkane boiling points.
Rücker, Christoph; Meringer, Markus; Kerber, Adalbert
2005-01-01
By means of the new software MOLGEN-QSPR, a multilinear regression model for the boiling points of lower fluoroalkanes is established. The model is based exclusively on simple descriptors derived directly from molecular structure and nevertheless describes a broader set of data more precisely than previous attempts that used either more demanding (quantum chemical) descriptors or more demanding (nonlinear) statistical methods such as neural networks. The model's internal consistency was confirmed by leave-one-out cross-validation. The model was used to predict all unknown boiling points of fluorobutanes, and the quality of predictions was estimated by means of comparison with boiling point predictions for fluoropentanes.
Riahi, Siavash; Hadiloo, Farshad; Milani, Seyed Mohammad R; Davarkhah, Nazila; Ganjali, Mohammad R; Norouzi, Parviz; Seyfi, Payam
2011-05-01
The accuracy in predicting different chemometric methods was compared when applied on ordinary UV spectra and first order derivative spectra. Principal component regression (PCR) and partial least squares with one dependent variable (PLS1) and two dependent variables (PLS2) were applied on spectral data of pharmaceutical formula containing pseudoephedrine (PDP) and guaifenesin (GFN). The ability to derivative in resolved overlapping spectra chloropheniramine maleate was evaluated when multivariate methods are adopted for analysis of two component mixtures without using any chemical pretreatment. The chemometrics models were tested on an external validation dataset and finally applied to the analysis of pharmaceuticals. Significant advantages were found in analysis of the real samples when the calibration models from derivative spectra were used. It should also be mentioned that the proposed method is a simple and rapid way requiring no preliminary separation steps and can be used easily for the analysis of these compounds, especially in quality control laboratories. Copyright © 2011 John Wiley & Sons, Ltd.
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION
Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong
2015-01-01
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Spacebased Estimation of Moisture Transport in Marine Atmosphere Using Support Vector Regression
NASA Technical Reports Server (NTRS)
Xie, Xiaosu; Liu, W. Timothy; Tang, Benyang
2007-01-01
An improved algorithm is developed based on support vector regression (SVR) to estimate horizonal water vapor transport integrated through the depth of the atmosphere ((Theta)) over the global ocean from observations of surface wind-stress vector by QuikSCAT, cloud drift wind vector derived from the Multi-angle Imaging SpectroRadiometer (MISR) and geostationary satellites, and precipitable water from the Special Sensor Microwave/Imager (SSM/I). The statistical relation is established between the input parameters (the surface wind stress, the 850 mb wind, the precipitable water, time and location) and the target data ((Theta) calculated from rawinsondes and reanalysis of numerical weather prediction model). The results are validated with independent daily rawinsonde observations, monthly mean reanalysis data, and through regional water balance. This study clearly demonstrates the improvement of (Theta) derived from satellite data using SVR over previous data sets based on linear regression and neural network. The SVR methodology reduces both mean bias and standard deviation comparedwith rawinsonde observations. It agrees better with observations from synoptic to seasonal time scales, and compare more favorably with the reanalysis data on seasonal variations. Only the SVR result can achieve the water balance over South America. The rationale of the advantage by SVR method and the impact of adding the upper level wind will also be discussed.
NASA Technical Reports Server (NTRS)
Deadmore, D. L.
1984-01-01
The effects of Cr, Al, Ti, Mo, Ta, Nb, and W content on the hot corrosion of nickel base alloys were investigated. The alloys were tested in a Mach 0.3 flame with 0.5 ppmw sodium at a temperature of 900 C. One nondestructive and three destructive tests were conducted. The best corrosion resistance was achieved when the Cr content was 12 wt %. However, some lower-Cr-content alloys ( 10 wt%) exhibited reasonable resistance provided that the Al content alloys ( 10 wt %) exhibited reasonable resistance provided that the Al content was 2.5 wt % and the Ti content was Aa wt %. The effect of W, Ta, Mo, and Nb contents on the hot-corrosion resistance varied depending on the Al and Ti contents. Several commercial alloy compositions were also tested and the corrosion attack was measured. Predicted attack was calculated for these alloys from derived regression equations and was in reasonable agreement with that experimentally measured. The regression equations were derived from measurements made on alloys in a one-quarter replicate of a 2(7) statistical design alloy composition experiment. These regression equations represent a simple linear model and are only a very preliminary analysis of the data needed to provide insights into the experimental method.
Kamphuis, C; Frank, E; Burke, J K; Verkerk, G A; Jago, J G
2013-01-01
The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected between November 2010 and June 2012 on 5 commercial pasture-based dairy farms. Sensor data from weigh scales (liveweight), pedometers (activity), and milk meters (milking order, unadjusted and adjusted milk yield in the first 2 min of milking, total milk yield, and milking duration) were collected at every milking from 4,904 cows. Lameness events were recorded by farmers who were trained in detecting lameness before the study commenced. A total of 318 lameness events affecting 292 cows were available for statistical analyses. For each lameness event, the lame cow's sensor data for a time period of 14 d before observation date were randomly matched by farm and date to 10 healthy cows (i.e., cows that were not lame and had no other health event recorded for the matched time period). Sensor data relating to the 14-d time periods were used for developing univariable (using one source of sensor data) and multivariable (using multiple sources of sensor data) models. Model development involved the use of additive logistic regression by applying the LogitBoost algorithm with a regression tree as base learner. The model's output was a probability estimate for lameness, given the sensor data collected during the 14-d time period. Models were validated using leave-one-farm-out cross-validation and, as a result of this validation, each cow in the data set (318 lame and 3,180 nonlame cows) received a probability estimate for lameness. Based on the area under the curve (AUC), results indicated that univariable models had low predictive potential, with the highest AUC values found for liveweight (AUC=0.66), activity (AUC=0.60), and milking order (AUC=0.65). Combining these 3 sensors improved AUC to 0.74. Detection performance of this combined model varied between farms but it consistently and significantly outperformed univariable models across farms at a fixed specificity of 80%. Still, detection performance was not high enough to be implemented in practice on large, pasture-based dairy farms. Future research may improve performance by developing variables based on sensor data of liveweight, activity, and milking order, but that better describe changes in sensor data patterns when cows go lame. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.
2017-09-01
Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable
models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general
terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.
Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin
2014-01-01
Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block.
Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin
2014-01-01
Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block. PMID:25258742
Pigment epithelium-derived factor as a multifunctional regulator of wound healing
Wietecha, Mateusz S.; Król, Mateusz J.; Michalczyk, Elizabeth R.; Chen, Lin; Gettins, Peter G.
2015-01-01
During dermal wound repair, hypoxia-driven proliferation results in dense but highly permeable, disorganized microvascular networks, similar to those in solid tumors. Concurrently, activated dermal fibroblasts generate an angiopermissive, provisional extracellular matrix (ECM). Unlike cancers, wounds naturally resolve via blood vessel regression and ECM maturation, which are essential for reestablishing tissue homeostasis. Mechanisms guiding wound resolution are poorly understood; one candidate regulator is pigment epithelium-derived factor (PEDF), a secreted glycoprotein. PEDF is a potent antiangiogenic in models of pathological angiogenesis and a promising cancer and cardiovascular disease therapeutic, but little is known about its physiological function. To examine the roles of PEDF in physiological wound repair, we used a reproducible model of excisional skin wound healing in BALB/c mice. We show that PEDF is abundant in unwounded and healing skin, is produced primarily by dermal fibroblasts, binds to resident microvascular endothelial cells, and accumulates in dermal ECM and epidermis. PEDF transcript and protein levels were low during the inflammatory and proliferative phases of healing but increased in quantity and colocalization with microvasculature during wound resolution. Local antibody inhibition of endogenous PEDF delayed vessel regression and collagen maturation during the remodeling phase. Treatment of wounds with intradermal injections of exogenous, recombinant PEDF inhibited nascent angiogenesis by repressing endothelial proliferation, promoted vascular integrity and function, and increased collagen maturity. These results demonstrate that PEDF contributes to the resolution of healing wounds by causing regression of immature blood vessels and stimulating maturation of the vascular microenvironment, thus promoting a return to tissue homeostasis after injury. PMID:26163443
NASA Astrophysics Data System (ADS)
Bo, Z.; Chen, J. H.
2010-02-01
The dimensional analysis technique is used to formulate a correlation between ozone generation rate and various parameters that are important in the design and operation of positive wire-to-plate corona discharges in indoor air. The dimensionless relation is determined by linear regression analysis based on the results from 36 laboratory-scale experiments. The derived equation is validated by experimental data and a numerical model published in the literature. Applications of such derived equation are illustrated through an example selection of the appropriate set of operating conditions in the design/operation of a photocopier to follow the federal regulations of ozone emission. Finally, a new current-voltage characteristic equation is proposed for positive wire-to-plate corona discharges based on the derived dimensionless equation.
A Subsonic Aircraft Design Optimization With Neural Network and Regression Approximators
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.; Haller, William J.
2004-01-01
The Flight-Optimization-System (FLOPS) code encountered difficulty in analyzing a subsonic aircraft. The limitation made the design optimization problematic. The deficiencies have been alleviated through use of neural network and regression approximations. The insight gained from using the approximators is discussed in this paper. The FLOPS code is reviewed. Analysis models are developed and validated for each approximator. The regression method appears to hug the data points, while the neural network approximation follows a mean path. For an analysis cycle, the approximate model required milliseconds of central processing unit (CPU) time versus seconds by the FLOPS code. Performance of the approximators was satisfactory for aircraft analysis. A design optimization capability has been created by coupling the derived analyzers to the optimization test bed CometBoards. The approximators were efficient reanalysis tools in the aircraft design optimization. Instability encountered in the FLOPS analyzer was eliminated. The convergence characteristics were improved for the design optimization. The CPU time required to calculate the optimum solution, measured in hours with the FLOPS code was reduced to minutes with the neural network approximation and to seconds with the regression method. Generation of the approximators required the manipulation of a very large quantity of data. Design sensitivity with respect to the bounds of aircraft constraints is easily generated.
Stature estimation equations for South Asian skeletons based on DXA scans of contemporary adults.
Pomeroy, Emma; Mushrif-Tripathy, Veena; Wells, Jonathan C K; Kulkarni, Bharati; Kinra, Sanjay; Stock, Jay T
2018-05-03
Stature estimation from the skeleton is a classic anthropological problem, and recent years have seen the proliferation of population-specific regression equations. Many rely on the anatomical reconstruction of stature from archaeological skeletons to derive regression equations based on long bone lengths, but this requires a collection with very good preservation. In some regions, for example, South Asia, typical environmental conditions preclude the sufficient preservation of skeletal remains. Large-scale epidemiological studies that include medical imaging of the skeleton by techniques such as dual-energy X-ray absorptiometry (DXA) offer new potential datasets for developing such equations. We derived estimation equations based on known height and bone lengths measured from DXA scans from the Andhra Pradesh Children and Parents Study (Hyderabad, India). Given debates on the most appropriate regression model to use, multiple methods were compared, and the performance of the equations was tested on a published skeletal dataset of individuals with known stature. The equations have standard errors of estimates and prediction errors similar to those derived using anatomical reconstruction or from cadaveric datasets. As measured by the number of significant differences between true and estimated stature, and the prediction errors, the new equations perform as well as, and generally better than, published equations commonly used on South Asian skeletons or based on Indian cadaveric datasets. This study demonstrates the utility of DXA scans as a data source for developing stature estimation equations and offer a new set of equations for use with South Asian datasets. © 2018 Wiley Periodicals, Inc.
A Fresh Start for Flood Estimation in Ungauged Basins
NASA Astrophysics Data System (ADS)
Woods, R. A.
2017-12-01
The two standard methods for flood estimation in ungauged basins, regression-based statistical models and rainfall-runoff models using a design rainfall event, have survived relatively unchanged as the methods of choice for more than 40 years. Their technical implementation has developed greatly, but the models' representation of hydrological processes has not, despite a large volume of hydrological research. I suggest it is time to introduce more hydrology into flood estimation. The reliability of the current methods can be unsatisfactory. For example, despite the UK's relatively straightforward hydrology, regression estimates of the index flood are uncertain by +/- a factor of two (for a 95% confidence interval), an impractically large uncertainty for design. The standard error of rainfall-runoff model estimates is not usually known, but available assessments indicate poorer reliability than statistical methods. There is a practical need for improved reliability in flood estimation. Two promising candidates to supersede the existing methods are (i) continuous simulation by rainfall-runoff modelling and (ii) event-based derived distribution methods. The main challenge with continuous simulation methods in ungauged basins is to specify the model structure and parameter values, when calibration data are not available. This has been an active area of research for more than a decade, and this activity is likely to continue. The major challenges for the derived distribution method in ungauged catchments include not only the correct specification of model structure and parameter values, but also antecedent conditions (e.g. seasonal soil water balance). However, a much smaller community of researchers are active in developing or applying the derived distribution approach, and as a result slower progress is being made. A change in needed: surely we have learned enough about hydrology in the last 40 years that we can make a practical hydrological advance on our methods for flood estimation! A shift to new methods for flood estimation will not be taken lightly by practitioners. However, the standard for change is clear - can we develop new methods which give significant improvements in reliability over those existing methods which are demonstrably unsatisfactory?
Marshall, Michael T.; Thenkabail, Prasad S.
2014-01-01
New satellite missions are expected to record high spectral resolution information globally and consistently for the first time, so it is important to identify modeling techniques that take advantage of these new data. In this paper, we estimate biomass for four major crops using ground-based hyperspectral narrowbands. The spectra and their derivatives are evaluated using three modeling techniques: two-band hyperspectral vegetation indices (HVIs), multiple band-HVIs (MB-HVIs) developed from Sequential Search Methods (SSM), and MB-HVIs developed from Principal Component Regression. Overall, the two-band HVIs and MB-HVIs developed from SSMs using first derivative transformed spectra in the visible blue and green and NIR explained more biomass variability and had lower error than the other approaches or transformations; however a better search criterion needs to be developed in order to reflect the true ability of the two-band HVI approach. Short-Wave Infrared 1 (1000 to 1700 nm) proved less effective, but still important in the final models.
Lopopolo, Alessandro; Frank, Stefan L; van den Bosch, Antal; Willems, Roel M
2017-01-01
Language comprehension involves the simultaneous processing of information at the phonological, syntactic, and lexical level. We track these three distinct streams of information in the brain by using stochastic measures derived from computational language models to detect neural correlates of phoneme, part-of-speech, and word processing in an fMRI experiment. Probabilistic language models have proven to be useful tools for studying how language is processed as a sequence of symbols unfolding in time. Conditional probabilities between sequences of words are at the basis of probabilistic measures such as surprisal and perplexity which have been successfully used as predictors of several behavioural and neural correlates of sentence processing. Here we computed perplexity from sequences of words and their parts of speech, and their phonemic transcriptions. Brain activity time-locked to each word is regressed on the three model-derived measures. We observe that the brain keeps track of the statistical structure of lexical, syntactic and phonological information in distinct areas.
Monthly streamflow forecasting based on hidden Markov model and Gaussian Mixture Regression
NASA Astrophysics Data System (ADS)
Liu, Yongqi; Ye, Lei; Qin, Hui; Hong, Xiaofeng; Ye, Jiajun; Yin, Xingli
2018-06-01
Reliable streamflow forecasts can be highly valuable for water resources planning and management. In this study, we combined a hidden Markov model (HMM) and Gaussian Mixture Regression (GMR) for probabilistic monthly streamflow forecasting. The HMM is initialized using a kernelized K-medoids clustering method, and the Baum-Welch algorithm is then executed to learn the model parameters. GMR derives a conditional probability distribution for the predictand given covariate information, including the antecedent flow at a local station and two surrounding stations. The performance of HMM-GMR was verified based on the mean square error and continuous ranked probability score skill scores. The reliability of the forecasts was assessed by examining the uniformity of the probability integral transform values. The results show that HMM-GMR obtained reasonably high skill scores and the uncertainty spread was appropriate. Different HMM states were assumed to be different climate conditions, which would lead to different types of observed values. We demonstrated that the HMM-GMR approach can handle multimodal and heteroscedastic data.
An Example-Based Brain MRI Simulation Framework.
He, Qing; Roy, Snehashis; Jog, Amod; Pham, Dzung L
2015-02-21
The simulation of magnetic resonance (MR) images plays an important role in the validation of image analysis algorithms such as image segmentation, due to lack of sufficient ground truth in real MR images. Previous work on MRI simulation has focused on explicitly modeling the MR image formation process. However, because of the overwhelming complexity of MR acquisition these simulations must involve simplifications and approximations that can result in visually unrealistic simulated images. In this work, we describe an example-based simulation framework, which uses an "atlas" consisting of an MR image and its anatomical models derived from the hard segmentation. The relationships between the MR image intensities and its anatomical models are learned using a patch-based regression that implicitly models the physics of the MR image formation. Given the anatomical models of a new brain, a new MR image can be simulated using the learned regression. This approach has been extended to also simulate intensity inhomogeneity artifacts based on the statistical model of training data. Results show that the example based MRI simulation method is capable of simulating different image contrasts and is robust to different choices of atlas. The simulated images resemble real MR images more than simulations produced by a physics-based model.
Empirical and semi-analytical models for predicting peak outflows caused by embankment dam failures
NASA Astrophysics Data System (ADS)
Wang, Bo; Chen, Yunliang; Wu, Chao; Peng, Yong; Song, Jiajun; Liu, Wenjun; Liu, Xin
2018-07-01
Prediction of peak discharge of floods has attracted great attention for researchers and engineers. In present study, nine typical nonlinear mathematical models are established based on database of 40 historical dam failures. The first eight models that were developed with a series of regression analyses are purely empirical, while the last one is a semi-analytical approach that was derived from an analytical solution of dam-break floods in a trapezoidal channel. Water depth above breach invert (Hw), volume of water stored above breach invert (Vw), embankment length (El), and average embankment width (Ew) are used as independent variables to develop empirical formulas of estimating the peak outflow from breached embankment dams. It is indicated from the multiple regression analysis that a function using the former two variables (i.e., Hw and Vw) produce considerably more accurate results than that using latter two variables (i.e., El and Ew). It is shown that the semi-analytical approach works best in terms of both prediction accuracy and uncertainty, and the established empirical models produce considerably reasonable results except the model only using El. Moreover, present models have been compared with other models available in literature for estimating peak discharge.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.
Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
ERIC Educational Resources Information Center
Haberman, Shelby J.
2009-01-01
A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a…
Pettorruso, Mauro; De Berardis, Domenico; Varasano, Paola Annunziata; Lucidi Pressanti, Gabriella; De Remigis, Valeria; Valchera, Alessandro; Ricci, Valerio; Di Nicola, Marco; Janiri, Luigi; Biggio, Giovanni; Di Giannantonio, Massimo
2016-01-01
Background: Agomelatine modulates brain-derived neurotrophic factor expression via its interaction with melatonergic and serotonergic receptors and has shown promising results in terms of brain-derived neurotrophic factor increase in animal models. Methods: Twenty-seven patients were started on agomelatine (25mg/d). Venous blood was collected and brain-derived neurotrophic factor serum levels were measured at baseline and after 2 and 8 weeks along with a clinical assessment, including Hamilton Depression Rating Scale and Snaith-Hamilton Pleasure Scale. Results: Brain-derived neurotrophic factor serum concentration increased after agomelatine treatment. Responders showed a significant increase in brain-derived neurotrophic factor levels after 2 weeks of agomelatine treatment; no difference was observed in nonresponders. Linear regression analysis showed that more prominent brain-derived neurotrophic factor level variation was associated with lower baseline BDNF levels and greater anhedonic features at baseline. Conclusions: Patients affected by depressive disorders showed an increase of brain-derived neurotrophic factor serum concentration after a 2-week treatment with agomelatine. The increase of brain-derived neurotrophic factor levels was found to be greater in patients with lower brain-derived neurotrophic factor levels and marked anhedonia at baseline. PMID:26775293
Ding, Changfeng; Li, Xiaogang; Zhang, Taolin; Ma, Yibing; Wang, Xingxiang
2014-10-01
Soil environmental quality standards in respect of heavy metals for farmlands should be established considering both their effects on crop yield and their accumulation in the edible part. A greenhouse experiment was conducted to investigate the effects of chromium (Cr) on biomass production and Cr accumulation in carrot plants grown in a wide range of soils. The results revealed that carrot yield significantly decreased in 18 of the total 20 soils with Cr addition being the soil environmental quality standard of China. The Cr content of carrot grown in the five soils with pH>8.0 exceeded the maximum allowable level (0.5mgkg(-1)) according to the Chinese General Standard for Contaminants in Foods. The relationship between carrot Cr concentration and soil pH could be well fitted (R(2)=0.70, P<0.0001) by a linear-linear segmented regression model. The addition of Cr to soil influenced carrot yield firstly rather than the food quality. The major soil factors controlling Cr phytotoxicity and the prediction models were further identified and developed using path analysis and stepwise multiple linear regression analysis. Soil Cr thresholds for phytotoxicity meanwhile ensuring food safety were then derived on the condition of 10 percent yield reduction. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Luo, Shezhou; Wang, Cheng; Xi, Xiaohuan; Pan, Feifei; Qian, Mingjie; Peng, Dailiang; Nie, Sheng; Qin, Haiming; Lin, Yi
2017-06-01
Wetland biomass is essential for monitoring the stability and productivity of wetland ecosystems. Conventional field methods to measure or estimate wetland biomass are accurate and reliable, but expensive, time consuming and labor intensive. This research explored the potential for estimating wetland reed biomass using a combination of airborne discrete-return Light Detection and Ranging (LiDAR) and hyperspectral data. To derive the optimal predictor variables of reed biomass, a range of LiDAR and hyperspectral metrics at different spatial scales were regressed against the field-observed biomasses. The results showed that the LiDAR-derived H_p99 (99th percentile of the LiDAR height) and hyperspectral-calculated modified soil-adjusted vegetation index (MSAVI) were the best metrics for estimating reed biomass using the single regression model. Although the LiDAR data yielded a higher estimation accuracy compared to the hyperspectral data, the combination of LiDAR and hyperspectral data produced a more accurate prediction model for reed biomass (R2 = 0.648, RMSE = 167.546 g/m2, RMSEr = 20.71%) than LiDAR data alone. Thus, combining LiDAR data with hyperspectral data has a great potential for improving the accuracy of aboveground biomass estimation.
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI.
Dikaios, Nikolaos; Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki; Abd-Alazeez, Mohamed; Kirkham, Alex; Allen, Clare; Ahmed, Hashim; Emberton, Mark; Freeman, Alex; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit
2015-02-01
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. • MRI helps find prostate cancer in the anterior of the gland • Logistic regression models based on mp-MRI can classify prostate cancer • Computers can help confirm cancer in areas doctors are uncertain about.
Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification.
Schisterman, Enrique F; Perkins, Neil J; Mumford, Sunni L; Ahrens, Katherine A; Mitchell, Emily M
2017-01-01
Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
QSAR modeling of flotation collectors using principal components extracted from topological indices.
Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R
2002-01-01
Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh
2005-12-16
Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k {sub i} values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, themore » previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.« less
Kasaie, Parastu; Mathema, Barun; Kelton, W David; Azman, Andrew S; Pennington, Jeff; Dowdy, David W
2015-01-01
In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission ("recent transmission proportion"), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional 'n-1' approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the 'n-1' technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the 'n-1' model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models' performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data.
Saub, R; Locker, D; Allison, P
2008-09-01
To compare two methods of developing short forms of the Malaysian Oral Health Impact Profile (OHIP-M) measure. Cross sectional data obtained using the long form of the OHIP-M was used to produce two types of OHIP-M short forms, derived using two different methods; namely regression and item frequency methods. The short version derived using a regression method is known as Reg-SOHIP(M) and that derived using a frequency method is known as Freq-SOHIP(M). Both short forms contained 14 items. These two forms were then compared in terms of their content, scores, reliability, validity and the ability to distinguish between groups. Out of 14 items, only four were in common. The form derived from the frequency method contained more high prevalence items and higher scores than the form derived from the regression method. Both methods produced a reliable and valid measure. However, the frequency method produced a measure, which was slightly better in terms of distinguishing between groups. Regardless of the method used to produce the measures, both forms performed equally well when tested for their cross-sectional psychometric properties.
Development of a funding, cost, and spending model for satellite projects
NASA Technical Reports Server (NTRS)
Johnson, Jesse P.
1989-01-01
The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.
The effects of a confidant and a peer group on the well-being of single elders.
Gupta, V; Korte, C
1994-01-01
A study of 100 elderly people was carried out to compare the predictions of well-being derived from the confidant model with those derived from the Weiss model. The confidant model predicts that the most important feature of a person's social network for the well-being of that person is whether or not the person has a confidant. The Weiss model states that different persons are needed to fulfill the different needs of the person and in particular that a confidant is important to the need for intimacy and emotional security while a peer group of social friends is needed to fulfill sociability and identity needs. The two models were evaluated by comparing the relative influence of the confidant variable with the peer group variable on subject's well-being. Regression analysis was carried out on the well-being measure using as predictor variables the confidant variable, peer group variable, age, health, and financial status. The confidant and peer group variables were of equal importance to well-being, thus confirming the Weiss model.
Raleigh, Veena; Sizmur, Steve; Tian, Yang; Thompson, James
2015-04-01
To examine the impact of patient-mix on National Health Service (NHS) acute hospital trust scores in two national NHS patient surveys. Secondary analysis of 2012 patient survey data for 57,915 adult inpatients at 142 NHS acute hospital trusts and 45,263 adult emergency department attendees at 146 NHS acute hospital trusts in England. Changes in trust scores for selected questions, ranks, inter-trust variance and score-based performance bands were examined using three methods: no adjustment for case-mix; the current standardization method with weighting for age, sex and, for inpatients only, admission method; and a regression model adjusting in addition for ethnicity, presence of a long-term condition, proxy response (inpatients only) and previous emergency attendances (emergency department survey only). For both surveys, all the variables examined were associated with patients' responses and affected inter-trust variance in scores, although the direction and strength of impact differed between variables. Inter-trust variance was generally greatest for the unadjusted scores and lowest for scores derived from the full regression model. Although trust scores derived from the three methods were highly correlated (Kendall's tau coefficients 0.70-0.94), up to 14% of trusts had discordant ranks of when the standardization and regression methods were compared. Depending on the survey and question, up to 14 trusts changed performance bands when the regression model with its fuller case-mix adjustment was used rather than the current standardization method. More comprehensive case-mix adjustment of patient survey data than the current limited adjustment reduces performance variation between NHS acute hospital trusts and alters the comparative performance bands of some trusts. Given the use of these data for high-impact purposes such as performance assessment, regulation, commissioning, quality improvement and patient choice, a review of the long-standing method for analysing patient survey data would be timely, and could improve rigour and comparability across the NHS. Performance comparisons need to be perceived as fair and scientifically robust to maintain confidence in publicly reported data, and to support their use by both the public and the NHS. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France)
NASA Astrophysics Data System (ADS)
Den Eeckhaut, M. Van; Marre, A.; Poesen, J.
2010-02-01
The vineyards of the Montagne de Reims are mostly planted on steep south-oriented cuesta fronts receiving a maximum of sun radiation. Due to the location of the vineyards on steep hillslopes, the viticultural activity is threatened by slope failures. This study attempts to better understand the spatial patterns of landslide susceptibility in the Champagne-Ardenne region by comparing a heuristic (qualitative) and a statistical (quantitative) model in a 1120 km² study area. The heuristic landslide susceptibility model was adopted from the Bureau de Recherches Géologiques et Minières, the GEGEAA - Reims University and the Comité Interprofessionnel du Vin de Champagne. In this model, expert knowledge of the region was used to assign weights to all slope classes and lithologies present in the area, but the final susceptibility map was never evaluated with the location of mapped landslides. For the statistical landslide susceptibility assessment, logistic regression was applied to a dataset of 291 'old' (Holocene) landslides. The robustness of the logistic regression model was evaluated and ROC curves were used for model calibration and validation. With regard to the variables assumed to be important environmental factors controlling landslides, the two models are in agreement. They both indicate that present and future landslides are mainly controlled by slope gradient and lithology. However, the comparison of the two landslide susceptibility maps through (1) an evaluation with the location of mapped 'old' landslides and through (2) a temporal validation with spatial data of 'recent' (1960-1999; n = 48) and 'very recent' (2000-2008; n = 46) landslides showed a better prediction capacity for the statistical model produced in this study compared to the heuristic model. In total, the statistically-derived landslide susceptibility map succeeded in correctly classifying 81.0% of the 'old' and 91.6% of the 'recent' and 'very recent' landslides. On the susceptibility map derived from the heuristic model, on the other hand, only 54.6% of the 'old' and 64.0% of the 'recent' and 'very recent' landslides were correctly classified as unstable. Hence, the landslide susceptibility map obtained from logistic regression is a better tool for regional landslide susceptibility analysis in the study area of the Montagne de Reims. The accurate classification of zones with very high and high susceptibility allows delineating zones where viticulturists should be informed and where implementation of precaution measures is needed to secure slope stability.
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Pattern Recognition Analysis of Age-Related Retinal Ganglion Cell Signatures in the Human Eye
Yoshioka, Nayuta; Zangerl, Barbara; Nivison-Smith, Lisa; Khuu, Sieu K.; Jones, Bryan W.; Pfeiffer, Rebecca L.; Marc, Robert E.; Kalloniatis, Michael
2017-01-01
Purpose To characterize macular ganglion cell layer (GCL) changes with age and provide a framework to assess changes in ocular disease. This study used data clustering to analyze macular GCL patterns from optical coherence tomography (OCT) in a large cohort of subjects without ocular disease. Methods Single eyes of 201 patients evaluated at the Centre for Eye Health (Sydney, Australia) were retrospectively enrolled (age range, 20–85); 8 × 8 grid locations obtained from Spectralis OCT macular scans were analyzed with unsupervised classification into statistically separable classes sharing common GCL thickness and change with age. The resulting classes and gridwise data were fitted with linear and segmented linear regression curves. Additionally, normalized data were analyzed to determine regression as a percentage. Accuracy of each model was examined through comparison of predicted 50-year-old equivalent macular GCL thickness for the entire cohort to a true 50-year-old reference cohort. Results Pattern recognition clustered GCL thickness across the macula into five to eight spatially concentric classes. F-test demonstrated segmented linear regression to be the most appropriate model for macular GCL change. The pattern recognition–derived and normalized model revealed less difference between the predicted macular GCL thickness and the reference cohort (average ± SD 0.19 ± 0.92 and −0.30 ± 0.61 μm) than a gridwise model (average ± SD 0.62 ± 1.43 μm). Conclusions Pattern recognition successfully identified statistically separable macular areas that undergo a segmented linear reduction with age. This regression model better predicted macular GCL thickness. The various unique spatial patterns revealed by pattern recognition combined with core GCL thickness data provide a framework to analyze GCL loss in ocular disease. PMID:28632847
Proton magnetic resonance spectroscopy for assessment of human body composition.
Kamba, M; Kimura, K; Koda, M; Ogawa, T
2001-02-01
The usefulness of magnetic resonance spectroscopy (MRS)-based techniques for assessment of human body composition has not been established. We compared a proton MRS-based technique with the total body water (TBW) method to determine the usefulness of the former technique for assessment of human body composition. Proton magnetic resonance spectra of the chest to abdomen, abdomen to pelvis, and pelvis to thigh regions were obtained from 16 volunteers by using single, free induction decay measurement with a clinical magnetic resonance system operating at 1.5 T. The MRS-derived metabolite ratio was determined as the ratio of fat methyl and methylene proton resonance to water proton resonance. The peak areas for the chest to abdomen and the pelvis to thigh regions were normalized to an external reference (approximately 2200 g benzene) and a weighted average of the MRS-derived metabolite ratios for the 2 positions was calculated. TBW for each subject was determined by the deuterium oxide dilution technique. The MRS-derived metabolite ratios were significantly correlated with the ratio of body fat to lean body mass estimated by TBW. The MRS-derived metabolite ratio for the abdomen to pelvis region correlated best with the ratio of body fat to lean body mass on simple regression analyses (r = 0.918). The MRS-derived metabolite ratio for the abdomen to pelvis region and that for the pelvis to thigh region were selected for a multivariate regression model (R = 0.947, adjusted R(2) = 0.881). This MRS-based technique is sufficiently accurate for assessment of human body composition.
Ross, Elsie Gyang; Shah, Nigam H; Dalman, Ronald L; Nead, Kevin T; Cooke, John P; Leeper, Nicholas J
2016-11-01
A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses. Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models. Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates. Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Residential magnetic fields predicted from wiring configurations: I. Exposure model.
Bowman, J D; Thomas, D C; Jiang, L; Jiang, F; Peters, J M
1999-10-01
A physically based model for residential magnetic fields from electric transmission and distribution wiring was developed to reanalyze the Los Angeles study of childhood leukemia by London et al. For this exposure model, magnetic field measurements were fitted to a function of wire configuration attributes that was derived from a multipole expansion of the Law of Biot and Savart. The model parameters were determined by nonlinear regression techniques, using wiring data, distances, and the geometric mean of the ELF magnetic field magnitude from 24-h bedroom measurements taken at 288 homes during the epidemiologic study. The best fit to the measurement data was obtained with separate models for the two major utilities serving Los Angeles County. This model's predictions produced a correlation of 0.40 with the measured fields, an improvement on the 0.27 correlation obtained with the Wertheimer-Leeper (WL) wire code. For the leukemia risk analysis in a companion paper, the regression model predicts exposures to the 24-h geometric mean of the ELF magnetic fields in Los Angeles homes where only wiring data and distances have been obtained. Since these input parameters for the exposure model usually do not change for many years, the predicted magnetic fields will be stable over long time periods, just like the WL code. If the geometric mean is not the exposure metric associated with cancer, this regression technique could be used to estimate long-term exposures to temporal variability metrics and other characteristics of the ELF magnetic field which may be cancer risk factors.
Determination of precipitation profiles from airborne passive microwave radiometric measurements
NASA Technical Reports Server (NTRS)
Kummerow, Christian; Hakkarinen, Ida M.; Pierce, Harold F.; Weinman, James A.
1991-01-01
This study presents the first quantitative retrievals of vertical profiles of precipitation derived from multispectral passive microwave radiometry. Measurements of microwave brightness temperature (Tb) obtained by a NASA high-altitude research aircraft are related to profiles of rainfall rate through a multichannel piecewise-linear statistical regression procedure. Statistics for Tb are obtained from a set of cloud radiative models representing a wide variety of convective, stratiform, and anvil structures. The retrieval scheme itself determines which cloud model best fits the observed meteorological conditions. Retrieved rainfall rate profiles are converted to equivalent radar reflectivity for comparison with observed reflectivities from a ground-based research radar. Results for two case studies, a stratiform rain situation and an intense convective thunderstorm, show that the radiometrically derived profiles capture the major features of the observed vertical structure of hydrometer density.
Golmohammadi, Hassan
2009-11-30
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.
Urban change analysis and future growth of Istanbul.
Akın, Anıl; Sunar, Filiz; Berberoğlu, Süha
2015-08-01
This study is aimed at analyzing urban change within Istanbul and assessing the city's future growth potential using appropriate approach modeling for the year 2040. Urban growth is a major driving force of land-use change, and spatial and temporal components of urbanization can be identified through accurate spatial modeling. In this context, widely used urban modeling approaches, such as the Markov chain and logistic regression based on cellular automata (CA), were used to simulate urban growth within Istanbul. The distance from each pixel to the urban and road classes, elevation, and slope, together with municipality and land use maps (as an excluded layer), were identified as factors. Calibration data were obtained from remotely sensed data recorded in 1972, 1986, and 2013. Validation was performed by overlaying the simulated and actual 2013 urban maps, and a kappa index of agreement was derived. The results indicate that urban expansion will influence mainly forest areas during the time period of 2013-2040. The urban expansion was predicted as 429 and 327 km(2) with the Markov chain and logistic regression models, respectively.
Soo-Hoo, Sarah; Nemeth, Samantha; Baser, Onur; Argenziano, Michael; Kurlansky, Paul
2018-01-01
To explore the impact of racial and ethnic diversity on the performance of cardiac surgical risk models, the Chinese SinoSCORE was compared with the Society of Thoracic Surgeons (STS) risk model in a diverse American population. The SinoSCORE risk model was applied to 13 969 consecutive coronary artery bypass surgery patients from twelve American institutions. SinoSCORE risk factors were entered into a logistic regression to create a 'derived' SinoSCORE whose performance was compared with that of the STS risk model. Observed mortality was 1.51% (66% of that predicted by STS model). The SinoSCORE 'low-risk' group had a mortality of 0.15%±0.04%, while the medium-risk and high-risk groups had mortalities of 0.35%±0.06% and 2.13%±0.14%, respectively. The derived SinoSCORE model had a relatively good discrimination (area under of the curve (AUC)=0.785) compared with that of the STS risk score (AUC=0.811; P=0.18 comparing the two). However, specific factors that were significant in the original SinoSCORE but that lacked significance in our derived model included body mass index, preoperative atrial fibrillation and chronic obstructive pulmonary disease. SinoSCORE demonstrated limited discrimination when applied to an American population. The derived SinoSCORE had a discrimination comparable with that of the STS, suggesting underlying similarities of physiological substrate undergoing surgery. However, differential influence of various risk factors suggests that there may be varying degrees of importance and interactions between risk factors. Clinicians should exercise caution when applying risk models across varying populations due to potential differences that racial, ethnic and geographic factors may play in cardiac disease and surgical outcomes.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
Bilgili, Mehmet; Sahin, Besir; Sangun, Levent
2013-01-01
The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.
Feng, Yao-Ze; Elmasry, Gamal; Sun, Da-Wen; Scannell, Amalia G M; Walsh, Des; Morcy, Noha
2013-06-01
Bacterial pathogens are the main culprits for outbreaks of food-borne illnesses. This study aimed to use the hyperspectral imaging technique as a non-destructive tool for quantitative and direct determination of Enterobacteriaceae loads on chicken fillets. Partial least squares regression (PLSR) models were established and the best model using full wavelengths was obtained in the spectral range 930-1450 nm with coefficients of determination R(2)≥ 0.82 and root mean squared errors (RMSEs) ≤ 0.47 log(10)CFUg(-1). In further development of simplified models, second derivative spectra and weighted PLS regression coefficients (BW) were utilised to select important wavelengths. However, the three wavelengths (930, 1121 and 1345 nm) selected from BW were competent and more preferred for predicting Enterobacteriaceae loads with R(2) of 0.89, 0.86 and 0.87 and RMSEs of 0.33, 0.40 and 0.45 log(10)CFUg(-1) for calibration, cross-validation and prediction, respectively. Besides, the constructed prediction map provided the distribution of Enterobacteriaceae bacteria on chicken fillets, which cannot be achieved by conventional methods. It was demonstrated that hyperspectral imaging is a potential tool for determining food sanitation and detecting bacterial pathogens on food matrix without using complicated laboratory regimes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Vegetation Continuous Fields--Transitioning from MODIS to VIIRS
NASA Astrophysics Data System (ADS)
DiMiceli, C.; Townshend, J. R.; Sohlberg, R. A.; Kim, D. H.; Kelly, M.
2015-12-01
Measurements of fractional vegetation cover are critical for accurate and consistent monitoring of global deforestation rates. They also provide important parameters for land surface, climate and carbon models and vital background data for research into fire, hydrological and ecosystem processes. MODIS Vegetation Continuous Fields (VCF) products provide four complementary layers of fractional cover: tree cover, non-tree vegetation, bare ground, and surface water. MODIS VCF products are currently produced globally and annually at 250m resolution for 2000 to the present. Additionally, annual VCF products at 1/20° resolution derived from AVHRR and MODIS Long-Term Data Records are in development to provide Earth System Data Records of fractional vegetation cover for 1982 to the present. In order to provide continuity of these valuable products, we are extending the VCF algorithms to create Suomi NPP/VIIRS VCF products. This presentation will highlight the first VIIRS fractional cover product: global percent tree cover at 1 km resolution. To create this product, phenological and physiological metrics were derived from each complete year of VIIRS 8-day surface reflectance products. A supervised regression tree method was applied to the metrics, using training derived from Landsat data supplemented by high-resolution data from Ikonos, RapidEye and QuickBird. The regression tree model was then applied globally to produce fractional tree cover. In our presentation we will detail our methods for creating the VIIRS VCF product. We will compare the new VIIRS VCF product to our current MODIS VCF products and demonstrate continuity between instruments. Finally, we will outline future VIIRS VCF development plans.
A nonparametric multiple imputation approach for missing categorical data.
Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh
2017-06-06
Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
NASA Astrophysics Data System (ADS)
Saad, Ahmed S.; Hamdy, Abdallah M.; Salama, Fathy M.; Abdelkawy, Mohamed
2016-10-01
Effect of data manipulation in preprocessing step proceeding construction of chemometric models was assessed. The same set of UV spectral data was used for construction of PLS and PCR models directly and after mathematically manipulation as per well known first and second derivatives of the absorption spectra, ratio spectra and first and second derivatives of the ratio spectra spectrophotometric methods, meanwhile the optimal working wavelength ranges were carefully selected for each model and the models were constructed. Unexpectedly, number of latent variables used for models' construction varied among the different methods. The prediction power of the different models was compared using a validation set of 8 mixtures prepared as per the multilevel multifactor design and results were statistically compared using two-way ANOVA test. Root mean squares error of prediction (RMSEP) was used for further comparison of the predictability among different constructed models. Although no significant difference was found between results obtained using Partial Least Squares (PLS) and Principal Component Regression (PCR) models, however, discrepancies among results was found to be attributed to the variation in the discrimination power of adopted spectrophotometric methods on spectral data.
Li, Bin; Shin, Hyunjin; Gulbekyan, Georgy; Pustovalova, Olga; Nikolsky, Yuri; Hope, Andrew; Bessarabova, Marina; Schu, Matthew; Kolpakova-Hart, Elona; Merberg, David; Dorner, Andrew; Trepicchio, William L.
2015-01-01
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug’s known mechanism of action. Also, the models predict each drug’s potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets. PMID:26107615
Li, Bin; Shin, Hyunjin; Gulbekyan, Georgy; Pustovalova, Olga; Nikolsky, Yuri; Hope, Andrew; Bessarabova, Marina; Schu, Matthew; Kolpakova-Hart, Elona; Merberg, David; Dorner, Andrew; Trepicchio, William L
2015-01-01
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug's known mechanism of action. Also, the models predict each drug's potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets.
Subramanyam, Rajeev; Yeramaneni, Samrat; Hossain, Mohamed Monir; Anneken, Amy M; Varughese, Anna M
2016-05-01
Perioperative respiratory adverse events (PRAEs) are the most common cause of serious adverse events in children receiving anesthesia. Our primary aim of this study was to develop and validate a risk prediction tool for the occurrence of PRAE from the onset of anesthesia induction until discharge from the postanesthesia care unit in children younger than 18 years undergoing elective ambulatory anesthesia for surgery and radiology. The incidence of PRAE was studied. We analyzed data from 19,059 patients from our department's quality improvement database. The predictor variables were age, sex, ASA physical status, morbid obesity, preexisting pulmonary disorder, preexisting neurologic disorder, and location of ambulatory anesthesia (surgery or radiology). Composite PRAE was defined as the presence of any 1 of the following events: intraoperative bronchospasm, intraoperative laryngospasm, postoperative apnea, postoperative laryngospasm, postoperative bronchospasm, or postoperative prolonged oxygen requirement. Development and validation of the risk prediction tool for PRAE were performed using a split sampling technique to split the database into 2 independent cohorts based on the year when the patient received ambulatory anesthesia for surgery and radiology using logistic regression. A risk score was developed based on the regression coefficients from the validation tool. The performance of the risk prediction tool was assessed by using tests of discrimination and calibration. The overall incidence of composite PRAE was 2.8%. The derivation cohort included 8904 patients, and the validation cohort included 10,155 patients. The risk of PRAE was 3.9% in the development cohort and 1.8% in the validation cohort. Age ≤ 3 years (versus >3 years), ASA physical status II or III (versus ASA physical status I), morbid obesity, preexisting pulmonary disorder, and surgery (versus radiology) significantly predicted the occurrence of PRAE in a multivariable logistic regression model. A risk score in the range of 0 to 3 was assigned to each significant variable in the logistic regression model, and final score for all risk factors ranged from 0 to 11. A cutoff score of 4 was derived from a receiver operating characteristic curve to determine the high-risk category. The model C-statistic and the corresponding SE for the derivation and validation cohort was 0.64 ± 0.01 and 0.63 ± 0.02, respectively. Sensitivity and SE of the risk prediction tool to identify children at risk for PRAE was 77.6 ± 0.02 in the derivation cohort and 76.2 ± 0.03 in the validation cohort. The risk tool developed and validated from our study cohort identified 5 risk factors: age ≤ 3 years (versus >3 years), ASA physical status II and III (versus ASA physical status I), morbid obesity, preexisting pulmonary disorder, and surgery (versus radiology) for PRAE. This tool can be used to provide an individual risk score for each patient to predict the risk of PRAE in the preoperative period.
NASA Astrophysics Data System (ADS)
Yu, H.; Gu, H.
2017-12-01
A novel multivariate seismic formation pressure prediction methodology is presented, which incorporates high-resolution seismic velocity data from prestack AVO inversion, and petrophysical data (porosity and shale volume) derived from poststack seismic motion inversion. In contrast to traditional seismic formation prediction methods, the proposed methodology is based on a multivariate pressure prediction model and utilizes a trace-by-trace multivariate regression analysis on seismic-derived petrophysical properties to calibrate model parameters in order to make accurate predictions with higher resolution in both vertical and lateral directions. With prestack time migration velocity as initial velocity model, an AVO inversion was first applied to prestack dataset to obtain high-resolution seismic velocity with higher frequency that is to be used as the velocity input for seismic pressure prediction, and the density dataset to calculate accurate Overburden Pressure (OBP). Seismic Motion Inversion (SMI) is an inversion technique based on Markov Chain Monte Carlo simulation. Both structural variability and similarity of seismic waveform are used to incorporate well log data to characterize the variability of the property to be obtained. In this research, porosity and shale volume are first interpreted on well logs, and then combined with poststack seismic data using SMI to build porosity and shale volume datasets for seismic pressure prediction. A multivariate effective stress model is used to convert velocity, porosity and shale volume datasets to effective stress. After a thorough study of the regional stratigraphic and sedimentary characteristics, a regional normally compacted interval model is built, and then the coefficients in the multivariate prediction model are determined in a trace-by-trace multivariate regression analysis on the petrophysical data. The coefficients are used to convert velocity, porosity and shale volume datasets to effective stress and then to calculate formation pressure with OBP. Application of the proposed methodology to a research area in East China Sea has proved that the method can bridge the gap between seismic and well log pressure prediction and give predicted pressure values close to pressure meassurements from well testing.
NASA Astrophysics Data System (ADS)
Libonati, R.; Dacamara, C. C.; Setzer, A. W.; Morelli, F.
2014-12-01
A procedure is presented that allows using information from the MODerate resolution Imaging Spectroradiometer (MODIS) sensor to improve the quality of monthly burned area estimates over Brazil. The method integrates MODIS derived information from two sources; the NASA MCD64A1 Direct Broadcast Monthly Burned Area Product and INPE's Monthly Burned Area MODIS product (AQM-MODIS). The latter product relies on an algorithm that was specifically designed for ecosystems in Brazil, taking advantage of the ability of MIR reflectances to discriminate burned areas. Information from both MODIS products is incorporated by means of a linear regression model where an optimal estimate of the burned area is obtained as a linear combination of burned area estimates from MCD64A1 and AQM-MODIS. The linear regression model is calibrated using as optimal estimates values of burned area derived from Landsat TM during 2005 and 2006 over Jalapão, a region of Cerrado covering an area of 187 x 187 km2. Obtained values of coefficients for MCD64A1 and AQM-MODIS were 0.51 and 0.35, respectively and the root mean square error was 7.6 km2. Robustness of the model was checked by calibrating the model separately for 2005 and 2006 and cross-validating with 2006 and 2005; coefficients for 2005 (2006) were 0.46 (0.54) for MCD64A1 and 0.35 (0.35) for AQM-MODIS and the corresponding root mean square errors for 2006 (2005) were 7.8 (7.4) km2. The linear model was then applied to Brazil as well as to the six Brazilian main biomes, namely Cerrado, Amazônia, Caatinga, Pantanal, Mata Atlântica and Pampa. As to be expected the interannual variability based on the proposed synergistic use of MCD64A1, AQM-MODIS and Landsat Tm data for the period 2005-2010 presents marked differences with the corresponding amounts derived from MCD64A1 alone. For instance during the considered period, values (in 103 km2) from the proposed approach (from MCD64A1) are 399 (142), 232 (62), 559 (259), 274 (73), 219 (31) and 415 (251). Values obtained with the proposed approach may be viewed as an improved alternative to the currently available products over Brazil.
Miyake, Kentaro; Murakami, Takashi; Kiyuna, Tasuku; Igarashi, Kentaro; Kawaguchi, Kei; Li, Yunfeng; Singh, Arun S; Dry, Sarah M; Eckardt, Mark A; Hiroshima, Yukihiko; Momiyama, Masashi; Matsuyama, Ryusei; Chishima, Takashi; Endo, Itaru; Eilber, Fritz C; Hoffman, Robert M
2018-01-01
Ewing's sarcoma is a recalcitrant tumor greatly in need of more effective therapy. The aim of this study was to determine the efficacy of eribulin on a doxorubicin (DOX)-resistant Ewing's sarcoma patient derived orthotopic xenograft (PDOX) model. The Ewing's sarcoma PDOX model was previously established in the right chest wall of nude mice from tumor resected form the patient's right chest wall. In the previous study, the Ewing's sarcoma PDOX was resistant to doxorubicin (DOX) and sensitive to palbociclib and linsitinib. In the present study, the PDOX models were randomized into three groups when the tumor volume reached 60 mm 3 : G1, untreated control (n = 6); G2, DOX treated (n = 6), intraperitoneal (i.p.) injection, weekly, for 2 weeks); G3, Eribulin treated (n = 6, intravenous (i.v.) injection, weekly for 2 weeks). All mice were sacrificed on day 15. Changes in body weight and tumor volume were assessed two times per week. Tumor weight was measured after sacrifice. DOX did not suppress tumor growth compared to the control group (P = 0.589), consistent with the previous results in the patient and PDOX. Eribulin regressed tumor size significantly compared to G1 and G2 (P = 0.006, P = 0.017) respectively. No significant difference was observed in body weight among any group. Our results demonstrate that eribulin is a promising novel therapeutic agent for Ewing's sarcoma. © 2017 Wiley Periodicals, Inc.
Utility of a New Model to Diagnose an Alcohol Basis for Steatohepatitis
Dunn, Winston; Angulo, Paul; Sanderson, Schuyler; Jamil, Laith H.; Stadheim, Linda; Rosen, Charles; Malinchoc, Michael; Kamath, Patrick S.; Shah, Vijay
2007-01-01
Background and Aims Distinguishing an alcohol basis from a nonalcoholic basis for the clinical and histological spectrum of steatohepatitic liver disease is difficult owing to unreliability of alcohol consumption history. Unfortunately, various biomarkers have had limited utility in distinguishing alcoholic liver disease (ALD) from nonalcoholic fatty liver disease (NAFLD). Thus, the aim of our study was to create and validate a model to diagnose ALD in patients with steatohepatitis. Methods Cross-sectional cohort study was performed at Mayo Clinic; Rochester, Minnesota to create a model using multivariable logistic regression analysis. This model was validated in three independent data-sets comprising patients of varying severity of steatohepatitis spanning over 10 years. Results Logistic regression identified mean corpuscular volume, AST/ALT ratio, body-mass index, and gender as the most important variables that separated patients with ALD from NAFLD. These variables were used to generate the ALD/NAFLD Index (ANI); with ANI of greater than 0 incrementally favoring ALD, and ANI of less than 0 incrementally favoring a diagnosis of NAFLD, thus making ALD unlikely. ANI had a c-statistic of 0.989 in the derivation sample, and 0.974, 0.989, 0.767 in the three validation samples. ANI performance characteristics were significantly better than several conventional and recently proposed biomarkers used to differentiate ALD from NAFLD including the histopathological marker Protein Tyrosine Phosphatase 1b, AST/ALT ratio, gamma-glutamyl transferase and Carbohydrate Deficient Transferrin. Conclusion ANI, derived from easily available objective variables, accurately differentiates ALD from NAFLD in hospitalized, ambulatory and pre-transplant patients and compares favorably to other traditional and proposed biomarkers. PMID:17030176
A Novel Continuous Blood Pressure Estimation Approach Based on Data Mining Techniques.
Miao, Fen; Fu, Nan; Zhang, Yuan-Ting; Ding, Xiao-Rong; Hong, Xi; He, Qingyun; Li, Ye
2017-11-01
Continuous blood pressure (BP) estimation using pulse transit time (PTT) is a promising method for unobtrusive BP measurement. However, the accuracy of this approach must be improved for it to be viable for a wide range of applications. This study proposes a novel continuous BP estimation approach that combines data mining techniques with a traditional mechanism-driven model. First, 14 features derived from simultaneous electrocardiogram and photoplethysmogram signals were extracted for beat-to-beat BP estimation. A genetic algorithm-based feature selection method was then used to select BP indicators for each subject. Multivariate linear regression and support vector regression were employed to develop the BP model. The accuracy and robustness of the proposed approach were validated for static, dynamic, and follow-up performance. Experimental results based on 73 subjects showed that the proposed approach exhibited excellent accuracy in static BP estimation, with a correlation coefficient and mean error of 0.852 and -0.001 ± 3.102 mmHg for systolic BP, and 0.790 and -0.004 ± 2.199 mmHg for diastolic BP. Similar performance was observed for dynamic BP estimation. The robustness results indicated that the estimation accuracy was lower by a certain degree one day after model construction but was relatively stable from one day to six months after construction. The proposed approach is superior to the state-of-the-art PTT-based model for an approximately 2-mmHg reduction in the standard derivation at different time intervals, thus providing potentially novel insights for cuffless BP estimation.
Wilke, Marko
2018-02-01
This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.
Validity of Treadmill-Derived Critical Speed on Predicting 5000-Meter Track-Running Performance.
Nimmerichter, Alfred; Novak, Nina; Triska, Christoph; Prinz, Bernhard; Breese, Brynmor C
2017-03-01
Nimmerichter, A, Novak, N, Triska, C, Prinz, B, and Breese, BC. Validity of treadmill-derived critical speed on predicting 5,000-meter track-running performance. J Strength Cond Res 31(3): 706-714, 2017-To evaluate 3 models of critical speed (CS) for the prediction of 5,000-m running performance, 16 trained athletes completed an incremental test on a treadmill to determine maximal aerobic speed (MAS) and 3 randomly ordered runs to exhaustion at the [INCREMENT]70% intensity, at 110% and 98% of MAS. Critical speed and the distance covered above CS (D') were calculated using the hyperbolic speed-time (HYP), the linear distance-time (LIN), and the linear speed inverse-time model (INV). Five thousand meter performance was determined on a 400-m running track. Individual predictions of 5,000-m running time (t = [5,000-D']/CS) and speed (s = D'/t + CS) were calculated across the 3 models in addition to multiple regression analyses. Prediction accuracy was assessed with the standard error of estimate (SEE) from linear regression analysis and the mean difference expressed in units of measurement and coefficient of variation (%). Five thousand meter running performance (speed: 4.29 ± 0.39 m·s; time: 1,176 ± 117 seconds) was significantly better than the predictions from all 3 models (p < 0.0001). The mean difference was 65-105 seconds (5.7-9.4%) for time and -0.22 to -0.34 m·s (-5.0 to -7.5%) for speed. Predictions from multiple regression analyses with CS and D' as predictor variables were not significantly different from actual running performance (-1.0 to 1.1%). The SEE across all models and predictions was approximately 65 seconds or 0.20 m·s and is therefore considered as moderate. The results of this study have shown the importance of aerobic and anaerobic energy system contribution to predict 5,000-m running performance. Using estimates of CS and D' is valuable for predicting performance over race distances of 5,000 m.
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics
Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike
2010-01-01
We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139
Paschalidou, Anastasia K; Karakitsios, Spyridon; Kleanthous, Savvas; Kassomenos, Pavlos A
2011-02-01
In the present work, two types of artificial neural network (NN) models using the multilayer perceptron (MLP) and the radial basis function (RBF) techniques, as well as a model based on principal component regression analysis (PCRA), are employed to forecast hourly PM(10) concentrations in four urban areas (Larnaca, Limassol, Nicosia and Paphos) in Cyprus. The model development is based on a variety of meteorological and pollutant parameters corresponding to the 2-year period between July 2006 and June 2008, and the model evaluation is achieved through the use of a series of well-established evaluation instruments and methodologies. The evaluation reveals that the MLP NN models display the best forecasting performance with R (2) values ranging between 0.65 and 0.76, whereas the RBF NNs and the PCRA models reveal a rather weak performance with R (2) values between 0.37-0.43 and 0.33-0.38, respectively. The derived MLP models are also used to forecast Saharan dust episodes with remarkable success (probability of detection ranging between 0.68 and 0.71). On the whole, the analysis shows that the models introduced here could provide local authorities with reliable and precise predictions and alarms about air quality if used on an operational basis.
Simulation of Structural Transformations in Heating of Alloy Steel
NASA Astrophysics Data System (ADS)
Kurkin, A. S.; Makarov, E. L.; Kurkin, A. B.; Rubtsov, D. E.; Rubtsov, M. E.
2017-07-01
Amathematical model for computer simulation of structural transformations in an alloy steel under the conditions of the thermal cycle of multipass welding is presented. The austenitic transformation under the heating and the processes of decomposition of bainite and martensite under repeated heating are considered. Amethod for determining the necessary temperature-time parameters of the model from the chemical composition of the steel is described. Published data are processed and the results used to derive regression models of the temperature ranges and parameters of transformation kinetics of alloy steels. The method developed is used in computer simulation of the process of multipass welding of pipes by the finite-element method.
A clinical decision rule to prioritize polysomnography in patients with suspected sleep apnea.
Rodsutti, Julvit; Hensley, Michael; Thakkinstian, Ammarin; D'Este, Catherine; Attia, John
2004-06-15
To derive and validate a clinical decision rule that can help to prioritize patients who are on waiting lists for polysomnography, Prospective data collection on consecutive patients referred to a sleep center. The Newcastle Sleep Disorders Centre, University of Newcastle, NSW, Australia. Consecutive adult patients who had been scheduled for initial diagnostic polysomnography. Eight hundred and thirty-seven patients were used for derivation of the decision rule. An apnea-hypopnoea index of at least 5 was used as the cutoff point to diagnose sleep apnea. Fifteen clinical features were included in the analyses using logistic regression to construct a model from the derivation data set. Only 5 variables--age, sex, body mass index, snoring, and stopping breathing during sleep--were significantly associated with sleep apnea. A scoring scheme based on regression coefficients was developed, and the total score was trichotomized into low-, moderate-, and high-risk groups with prevalence of sleep apnea of 8%, 51%, and 82%, respectively. Color-coded tables were developed for ease of use. The clinical decision rule was validated on a separate set of 243 patients. Receiver operating characteristic analysis confirmed that the decision rule performed well, with the area under the curve being similar for both the derivation and validation sets: 0.81 and 0.79, P =.612. We conclude that this decision rule was able to accurately classify the risk of sleep apnea and will be useful for prioritizing patients with suspected sleep apnea who are on waiting lists for polysomnography.
Chen, Chen; Xie, Yuanchang
2016-06-01
Annual Average Daily Traffic (AADT) is often considered as a main covariate for predicting crash frequencies at urban and suburban intersections. A linear functional form is typically assumed for the Safety Performance Function (SPF) to describe the relationship between the natural logarithm of expected crash frequency and covariates derived from AADTs. Such a linearity assumption has been questioned by many researchers. This study applies Generalized Additive Models (GAMs) and Piecewise Linear Negative Binomial (PLNB) regression models to fit intersection crash data. Various covariates derived from minor-and major-approach AADTs are considered. Three different dependent variables are modeled, which are total multiple-vehicle crashes, rear-end crashes, and angle crashes. The modeling results suggest that a nonlinear functional form may be more appropriate. Also, the results show that it is important to take into consideration the joint safety effects of multiple covariates. Additionally, it is found that the ratio of minor to major-approach AADT has a varying impact on intersection safety and deserves further investigations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Parameters of Models of Structural Transformations in Alloy Steel Under Welding Thermal Cycle
NASA Astrophysics Data System (ADS)
Kurkin, A. S.; Makarov, E. L.; Kurkin, A. B.; Rubtsov, D. E.; Rubtsov, M. E.
2017-05-01
A mathematical model of structural transformations in an alloy steel under the thermal cycle of multipass welding is suggested for computer implementation. The minimum necessary set of parameters for describing the transformations under heating and cooling is determined. Ferritic-pearlitic, bainitic and martensitic transformations under cooling of a steel are considered. A method for deriving the necessary temperature and time parameters of the model from the chemical composition of the steel is described. Published data are used to derive regression models of the temperature ranges and parameters of transformation kinetics in alloy steels. It is shown that the disadvantages of the active visual methods of analysis of the final phase composition of steels are responsible for inaccuracy and mismatch of published data. The hardness of a specimen, which correlates with some other mechanical properties of the material, is chosen as the most objective and reproducible criterion of the final phase composition. The models developed are checked by a comparative analysis of computational results and experimental data on the hardness of 140 alloy steels after cooling at various rates.
NASA Astrophysics Data System (ADS)
Madani, Nima; Kimball, John S.; Running, Steven W.
2017-11-01
In the light use efficiency (LUE) approach of estimating the gross primary productivity (GPP), plant productivity is linearly related to absorbed photosynthetically active radiation assuming that plants absorb and convert solar energy into biomass within a maximum LUE (LUEmax) rate, which is assumed to vary conservatively within a given biome type. However, it has been shown that photosynthetic efficiency can vary within biomes. In this study, we used 149 global CO2 flux towers to derive the optimum LUE (LUEopt) under prevailing climate conditions for each tower location, stratified according to model training and test sites. Unlike LUEmax, LUEopt varies according to heterogeneous landscape characteristics and species traits. The LUEopt data showed large spatial variability within and between biome types, so that a simple biome classification explained only 29% of LUEopt variability over 95 global tower training sites. The use of explanatory variables in a mixed effect regression model explained 62.2% of the spatial variability in tower LUEopt data. The resulting regression model was used for global extrapolation of the LUEopt data and GPP estimation. The GPP estimated using the new LUEopt map showed significant improvement relative to global tower data, including a 15% R2 increase and 34% root-mean-square error reduction relative to baseline GPP calculations derived from biome-specific LUEmax constants. The new global LUEopt map is expected to improve the performance of LUE-based GPP algorithms for better assessment and monitoring of global terrestrial productivity and carbon dynamics.
Acute toxicity prediction to threatened and endangered ...
Evaluating contaminant sensitivity of threatened and endangered (listed) species and protectiveness of chemical regulations often depends on toxicity data for commonly tested surrogate species. The U.S. EPA’s Internet application Web-ICE is a suite of Interspecies Correlation Estimation (ICE) models that can extrapolate species sensitivity to listed taxa using least-squares regressions of the sensitivity of a surrogate species and a predicted taxon (species, genus, or family). Web-ICE was expanded with new models that can predict toxicity to over 250 listed species. A case study was used to assess protectiveness of genus and family model estimates derived from either geometric mean or minimum taxa toxicity values for listed species. Models developed from the most sensitive value for each chemical were generally protective of the most sensitive species within predicted taxa, including listed species, and were more protective than geometric means models. ICE model estimates were compared to HC5 values derived from Species Sensitivity Distributions for the case study chemicals to assess protectiveness of the two approaches. ICE models provide robust toxicity predictions and can generate protective toxicity estimates for assessing contaminant risk to listed species. Reporting on the development and optimization of ICE models for listed species toxicity estimation
Willming, Morgan M; Lilavois, Crystal R; Barron, Mace G; Raimondo, Sandy
2016-10-04
Evaluating contaminant sensitivity of threatened and endangered (listed) species and protectiveness of chemical regulations often depends on toxicity data for commonly tested surrogate species. The U.S. EPA's Internet application Web-ICE is a suite of Interspecies Correlation Estimation (ICE) models that can extrapolate species sensitivity to listed taxa using least-squares regressions of the sensitivity of a surrogate species and a predicted taxon (species, genus, or family). Web-ICE was expanded with new models that can predict toxicity to over 250 listed species. A case study was used to assess protectiveness of genus and family model estimates derived from either geometric mean or minimum taxa toxicity values for listed species. Models developed from the most sensitive value for each chemical were generally protective of the most sensitive species within predicted taxa, including listed species, and were more protective than geometric means models. ICE model estimates were compared to HC5 values derived from Species Sensitivity Distributions for the case study chemicals to assess protectiveness of the two approaches. ICE models provide robust toxicity predictions and can generate protective toxicity estimates for assessing contaminant risk to listed species.
Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu
2016-01-01
We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1–10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633–0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926–0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation. PMID:27586851
NASA Astrophysics Data System (ADS)
Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu
2016-09-01
We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1-10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633-0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926-0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation.
Predicting the demand of physician workforce: an international model based on "crowd behaviors".
Tsai, Tsuen-Chiuan; Eliasziw, Misha; Chen, Der-Fang
2012-03-26
Appropriateness of physician workforce greatly influences the quality of healthcare. When facing the crisis of physician shortages, the correction of manpower always takes an extended time period, and both the public and health personnel suffer. To calculate an appropriate number of Physician Density (PD) for a specific country, this study was designed to create a PD prediction model, based on health-related data from many countries. Twelve factors that could possibly impact physicians' demand were chosen, and data of these factors from 130 countries (by reviewing 195) were extracted. Multiple stepwise-linear regression was used to derive the PD prediction model, and a split-sample cross-validation procedure was performed to evaluate the generalizability of the results. Using data from 130 countries, with the consideration of the correlation between variables, and preventing multi-collinearity, seven out of the 12 predictor variables were selected for entry into the stepwise regression procedure. The final model was: PD = (5.014 - 0.128 × proportion under age 15 years + 0.034 × life expectancy)2, with R2 of 80.4%. Using the prediction equation, 70 countries had PDs with "negative discrepancy", while 58 had PDs with "positive discrepancy". This study provided a regression-based PD model to calculate a "norm" number of PD for a specific country. A large PD discrepancy in a country indicates the needs to examine physician's workloads and their well-being, the effectiveness/efficiency of medical care, the promotion of population health and the team resource management.
Hébert, J R; Peterson, K E; Hurley, T G; Stoddard, A M; Cohen, N; Field, A E; Sorensen, G
2001-08-01
To evaluate the effect of social desirability trait, the tendency to respond in a manner consistent with societal expectations, on self-reported fruit, vegetable, and macronutrient intake. A 61-item food frequency questionnaire (FFQ), 7-item fruit and vegetable screener, and a single question on combined fruit and vegetable intake were completed by 132 female employees at five health centers in eastern Massachusetts. Intake of fruit and vegetables derived from all three methods and macronutrients from the FFQ were fit as dependent variables in multiple linear regression models (overall and by race/ethnicity and education); independent variables included 3-day mean intakes derived from 24-hour recalls (24HR) and score on the 33-point Marlowe-Crowne Social Desirability scale (the regression coefficient for which reflects its effect on estimates of dietary intake based on the comparison method relative to 24HR). Results are based on the 93 women with complete data and FFQ-derived caloric intake between 450 and 4500 kcal/day. In women with college education, FFQ-derived estimates of total caloric were associated with under-reporting by social desirability trait (e.g., the regression coefficient for total caloric intake was -23.6 kcal/day/point in that group versus 36.1 kcal/day/point in women with education less than college) (difference = 59.7 kcal/day/point, 95% confidence interval (CI) = 13.2, 106.2). Except for the single question on which women with college education tended to under-report (difference =.103 servings/day/point, 95% CI = 0.003, 0.203), there was no association of social desirability trait with self-reported fruit and vegetable intake. The effect of social desirability trait on FFQ reports of macronutrient intake appeared to differ by education, but not by ethnicity or race. The results of this study may have important implications for epidemiologic studies of diet and health in women.
Yavari, Reza; McEntee, Erin; McEntee, Michael; Brines, Michael
2011-01-01
The current world-wide epidemic of obesity has stimulated interest in developing simple screening methods to identify individuals with undiagnosed diabetes mellitus type 2 (DM2) or metabolic syndrome (MS). Prior work utilizing body composition obtained by sophisticated technology has shown that the ratio of abdominal fat to total fat is a good predictor for DM2 or MS. The goals of this study were to determine how well simple anthropometric variables predict the fat mass distribution as determined by dual energy x-ray absorptometry (DXA), and whether these are useful to screen for DM2 or MS within a population. To accomplish this, the body composition of 341 females spanning a wide range of body mass indices and with a 23% prevalence of DM2 and MS was determined using DXA. Stepwise linear regression models incorporating age, weight, height, waistline, and hipline predicted DXA body composition (i.e., fat mass, trunk fat, fat free mass, and total mass) with good accuracy. Using body composition as independent variables, nominal logistic regression was then performed to estimate the probability of DM2. The results show good discrimination with the receiver operating characteristic (ROC) having an area under the curve (AUC) of 0.78. The anthropometrically-derived body composition equations derived from the full DXA study group were then applied to a group of 1153 female patients selected from a general endocrinology practice. Similar to the smaller study group, the ROC from logistical regression using body composition had an AUC of 0.81 for the detection of DM2. These results are superior to screening based on questionnaires and compare favorably with published data derived from invasive testing, e.g., hemoglobin A1c. This anthropometric approach offers promise for the development of simple, inexpensive, non-invasive screening to identify individuals with metabolic dysfunction within large populations. PMID:21915276
Wang, Jingzhe; Abulimiti, Aerzuna; Cai, Lianghong
2018-01-01
Soil salinization is one of the most common forms of land degradation. The detection and assessment of soil salinity is critical for the prevention of environmental deterioration especially in arid and semi-arid areas. This study introduced the fractional derivative in the pretreatment of visible and near infrared (VIS–NIR) spectroscopy. The soil samples (n = 400) collected from the Ebinur Lake Wetland, Xinjiang Uyghur Autonomous Region (XUAR), China, were used as the dataset. After measuring the spectral reflectance and salinity in the laboratory, the raw spectral reflectance was preprocessed by means of the absorbance and the fractional derivative order in the range of 0.0–2.0 order with an interval of 0.1. Two different modeling methods, namely, partial least squares regression (PLSR) and random forest (RF) with preprocessed reflectance were used for quantifying soil salinity. The results showed that more spectral characteristics were refined for the spectrum reflectance treated via fractional derivative. The validation accuracies showed that RF models performed better than those of PLSR. The most effective model was established based on RF with the 1.5 order derivative of absorbance with the optimal values of R2 (0.93), RMSE (4.57 dS m−1), and RPD (2.78 ≥ 2.50). The developed RF model was stable and accurate in the application of spectral reflectance for determining the soil salinity of the Ebinur Lake wetland. The pretreatment of fractional derivative could be useful for monitoring multiple soil parameters with higher accuracy, which could effectively help to analyze the soil salinity. PMID:29736341
Wang, Jingzhe; Ding, Jianli; Abulimiti, Aerzuna; Cai, Lianghong
2018-01-01
Soil salinization is one of the most common forms of land degradation. The detection and assessment of soil salinity is critical for the prevention of environmental deterioration especially in arid and semi-arid areas. This study introduced the fractional derivative in the pretreatment of visible and near infrared (VIS-NIR) spectroscopy. The soil samples ( n = 400) collected from the Ebinur Lake Wetland, Xinjiang Uyghur Autonomous Region (XUAR), China, were used as the dataset. After measuring the spectral reflectance and salinity in the laboratory, the raw spectral reflectance was preprocessed by means of the absorbance and the fractional derivative order in the range of 0.0-2.0 order with an interval of 0.1. Two different modeling methods, namely, partial least squares regression (PLSR) and random forest (RF) with preprocessed reflectance were used for quantifying soil salinity. The results showed that more spectral characteristics were refined for the spectrum reflectance treated via fractional derivative. The validation accuracies showed that RF models performed better than those of PLSR. The most effective model was established based on RF with the 1.5 order derivative of absorbance with the optimal values of R 2 (0.93), RMSE (4.57 dS m -1 ), and RPD (2.78 ≥ 2.50). The developed RF model was stable and accurate in the application of spectral reflectance for determining the soil salinity of the Ebinur Lake wetland. The pretreatment of fractional derivative could be useful for monitoring multiple soil parameters with higher accuracy, which could effectively help to analyze the soil salinity.
Detecting understory plant invasion in urban forests using LiDAR
NASA Astrophysics Data System (ADS)
Singh, Kunwar K.; Davis, Amy J.; Meentemeyer, Ross K.
2015-06-01
Light detection and ranging (LiDAR) data are increasingly used to measure structural characteristics of urban forests but are rarely used to detect the growing problem of exotic understory plant invaders. We explored the merits of using LiDAR-derived metrics alone and through integration with spectral data to detect the spatial distribution of the exotic understory plant Ligustrum sinense, a rapidly spreading invader in the urbanizing region of Charlotte, North Carolina, USA. We analyzed regional-scale L. sinense occurrence data collected over the course of three years with LiDAR-derived metrics of forest structure that were categorized into the following groups: overstory, understory, topography, and overall vegetation characteristics, and IKONOS spectral features - optical. Using random forest (RF) and logistic regression (LR) classifiers, we assessed the relative contributions of LiDAR and IKONOS derived variables to the detection of L. sinense. We compared the top performing models developed for a smaller, nested experimental extent using RF and LR classifiers, and used the best overall model to produce a predictive map of the spatial distribution of L. sinense across our country-wide study extent. RF classification of LiDAR-derived topography metrics produced the highest mapping accuracy estimates, outperforming IKONOS data by 17.5% and the integration of LiDAR and IKONOS data by 5.3%. The top performing model from the RF classifier produced the highest kappa of 64.8%, improving on the parsimonious LR model kappa by 31.1% with a moderate gain of 6.2% over the county extent model. Our results demonstrate the superiority of LiDAR-derived metrics over spectral data and fusion of LiDAR and spectral data for accurately mapping the spatial distribution of the forest understory invader L. sinense.
NASA Astrophysics Data System (ADS)
Afolagboye, Lekan Olatayo; Talabi, Abel Ojo; Oyelami, Charles Adebayo
2017-05-01
This study assessed the possibility of using index tests to determine the mechanical properties of crushed aggregates. The aggregates used in this study were derived from major Precambrian basement rocks in Ado-Ekiti, Nigeria. Regression analyses were performed to determine the empirical relations that mechanical properties of the aggregates may have with the point load strength (IS(50)), Schmidt rebound hammer value (SHR) and unconfined compressive strength (UCS) of the rocks. For all the data, strong correlation coefficients were found between IS(50), SHR, UCS, and mechanical properties of the aggregates. The regression analysis conducted on the different rocks separately showed that correlations coefficients obtained between the IS(50), SHR, UCS and mechanical properties of the aggregates were stronger than those of the grouped rocks. The T-test and F-test showed that the derived models were valid. This study has shown that the mechanical properties of the aggregates can be estimated from IS(50), SHR and USC but the influence of rock type on the relationships should be taken into consideration.
Ehring, Thomas; Ehlers, Anke; Glucksman, Edward
2008-01-01
The study investigated the power of theoretically derived cognitive variables to predict posttraumatic stress disorder (PTSD), travel phobia, and depression following injury in a motor vehicle accident (MVA). MVA survivors (N = 147) were assessed at the emergency department on the day of their accident and 2 weeks, 1 month, 3 months, and 6 months later. Diagnoses were established with the Structured Clinical Interview for DSM–IV. Predictors included initial symptom severities; variables established as predictors of PTSD in E. J. Ozer, S. R. Best, T. L. Lipsey, and D. S. Weiss's (2003) meta-analysis; and variables derived from cognitive models of PTSD, phobia, and depression. Results of nonparametric multiple regression analyses showed that the cognitive variables predicted subsequent PTSD and depression severities over and above what could be predicted from initial symptom levels. They also showed greater predictive power than the established predictors, although the latter showed similar effect sizes as in the meta-analysis. In addition, the predictors derived from cognitive models of PTSD and depression were disorder-specific. The results support the role of cognitive factors in the maintenance of emotional disorders following trauma. PMID:18377119
Jóźwiak, Michał; Stępień, Karolina; Wrzosek, Małgorzata; Olejarz, Wioletta; Kubiak-Tomaszewska, Grażyna; Filipowska, Anna; Filipowski, Wojciech; Struga, Marta
2018-04-03
Thirty new derivatives of palmitic acid were efficiently synthesized. All obtained compounds can be divided into three groups of derivatives: Thiosemicarbazides (compounds 1 - 10 ), 1,2,4-triazoles (compounds 1a - 10a ) and 1,3,4-thiadiazoles (compounds 1b - 10b ) moieties. ¹H-NMR, 13 C-NMR and MS methods were used to confirm the structure of derivatives. All obtained compounds were tested in vitro against a number of microorganisms, including Gram-positive cocci, Gram-negative rods and Candida albicans . Compounds 4 , 5 , 6 , 8 showed significant inhibition against C. albicans . The range of MIC values was 50-1.56 μg/mL. The halogen atom, especially at the 3rd position of the phenyl group was significantly important for antifungal activity. The biological activity against Candida albicans and selected molecular descriptors were used as a basis for QSAR models, that have been determined by means of multiple linear regression. The models have been validated by means of the Leave-One-Out Cross Validation. The obtained QSAR models were characterized by high determination coefficients and good prediction power.
Estimation of average annual streamflows and power potentials for Alaska and Hawaii
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdin, Kristine L.
2004-05-01
This paper describes the work done to develop average annual streamflow estimates and power potential for the states of Alaska and Hawaii. The Elevation Derivatives for National Applications (EDNA) database was used, along with climatic datasets, to develop flow and power estimates for every stream reach in the EDNA database. Estimates of average annual streamflows were derived using state-specific regression equations, which were functions of average annual precipitation, precipitation intensity, drainage area, and other elevation-derived parameters. Power potential was calculated through the use of the average annual streamflow and the hydraulic head of each reach, which is calculated from themore » EDNA digital elevation model. In all, estimates of streamflow and power potential were calculated for over 170,000 stream segments in the Alaskan and Hawaiian datasets.« less
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
Estimation of crown closure from AVIRIS data using regression analysis
NASA Technical Reports Server (NTRS)
Staenz, K.; Williams, D. J.; Truchon, M.; Fritz, R.
1993-01-01
Crown closure is one of the input parameters used for forest growth and yield modelling. Preliminary work by Staenz et al. indicates that imaging spectrometer data acquired with sensors such as the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) have some potential for estimating crown closure on a stand level. The objectives of this paper are: (1) to establish a relationship between AVIRIS data and the crown closure derived from aerial photography of a forested test site within the Interior Douglas Fir biogeoclimatic zone in British Columbia, Canada; (2) to investigate the impact of atmospheric effects and the forest background on the correlation between AVIRIS data and crown closure estimates; and (3) to improve this relationship using multiple regression analysis.
Genetic background in partitioning of metabolizable energy efficiency in dairy cows.
Mehtiö, T; Negussie, E; Mäntysaari, P; Mäntysaari, E A; Lidauer, M H
2018-05-01
The main objective of this study was to assess the genetic differences in metabolizable energy efficiency and efficiency in partitioning metabolizable energy in different pathways: maintenance, milk production, and growth in primiparous dairy cows. Repeatability models for residual energy intake (REI) and metabolizable energy intake (MEI) were compared and the genetic and permanent environmental variations in MEI were partitioned into its energy sinks using random regression models. We proposed 2 new feed efficiency traits: metabolizable energy efficiency (MEE), which is formed by modeling MEI fitting regressions on energy sinks [metabolic body weight (BW 0.75 ), energy-corrected milk, body weight gain, and body weight loss] directly; and partial MEE (pMEE), where the model for MEE is extended with regressions on energy sinks nested within additive genetic and permanent environmental effects. The data used were collected from Luke's experimental farms Rehtijärvi and Minkiö between 1998 and 2014. There were altogether 12,350 weekly MEI records on 495 primiparous Nordic Red dairy cows from wk 2 to 40 of lactation. Heritability estimates for REI and MEE were moderate, 0.33 and 0.26, respectively. The estimate of the residual variance was smaller for MEE than for REI, indicating that analyzing weekly MEI observations simultaneously with energy sinks is preferable. Model validation based on Akaike's information criterion showed that pMEE models fitted the data even better and also resulted in smaller residual variance estimates. However, models that included random regression on BW 0.75 converged slowly. The resulting genetic standard deviation estimate from the pMEE coefficient for milk production was 0.75 MJ of MEI/kg of energy-corrected milk. The derived partial heritabilities for energy efficiency in maintenance, milk production, and growth were 0.02, 0.06, and 0.04, respectively, indicating that some genetic variation may exist in the efficiency of using metabolizable energy for different pathways in dairy cows. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Kendall, G M; Wakeford, R; Athanson, M; Vincent, T J; Carter, E J; McColl, N P; Little, M P
2016-03-01
Gamma radiation from natural sources (including directly ionising cosmic rays) is an important component of background radiation. In the present paper, indoor measurements of naturally occurring gamma rays that were undertaken as part of the UK Childhood Cancer Study are summarised, and it is shown that these are broadly compatible with an earlier UK National Survey. The distribution of indoor gamma-ray dose rates in Great Britain is approximately normal with mean 96 nGy/h and standard deviation 23 nGy/h. Directly ionising cosmic rays contribute about one-third of the total. The expanded dataset allows a more detailed description than previously of indoor gamma-ray exposures and in particular their geographical variation. Various strategies for predicting indoor natural background gamma-ray dose rates were explored. In the first of these, a geostatistical model was fitted, which assumes an underlying geologically determined spatial variation, superimposed on which is a Gaussian stochastic process with Matérn correlation structure that models the observed tendency of dose rates in neighbouring houses to correlate. In the second approach, a number of dose-rate interpolation measures were first derived, based on averages over geologically or administratively defined areas or using distance-weighted averages of measurements at nearest-neighbour points. Linear regression was then used to derive an optimal linear combination of these interpolation measures. The predictive performances of the two models were compared via cross-validation, using a randomly selected 70 % of the data to fit the models and the remaining 30 % to test them. The mean square error (MSE) of the linear-regression model was lower than that of the Gaussian-Matérn model (MSE 378 and 411, respectively). The predictive performance of the two candidate models was also evaluated via simulation; the OLS model performs significantly better than the Gaussian-Matérn model.
Nguyen, X Cuong; Chang, S Woong; Nguyen, Thi Loan; Ngo, H Hao; Kumar, Gopalakrishnan; Banu, J Rajesh; Vu, M Cuong; Le, H Sinh; Nguyen, D Duc
2018-09-15
A pilot-scale hybrid constructed wetland with vertical flow and horizontal flow in series was constructed and used to investigate organic material and nutrient removal rate constants for wastewater treatment and establish a practical predictive model for use. For this purpose, the performance of multiple parameters was statistically evaluated during the process and predictive models were suggested. The measurement of the kinetic rate constant was based on the use of the first-order derivation and Monod kinetic derivation (Monod) paired with a plug flow reactor (PFR) and a continuously stirred tank reactor (CSTR). Both the Lindeman, Merenda, and Gold (LMG) analysis and Bayesian model averaging (BMA) method were employed for identifying the relative importance of variables and their optimal multiple regression (MR). The results showed that the first-order-PFR (M 2 ) model did not fit the data (P > 0.05, and R 2 < 0.5), whereas the first-order-CSTR (M 1 ) model for the chemical oxygen demand (COD Cr ) and Monod-CSTR (M 3 ) model for the COD Cr and ammonium nitrogen (NH 4 -N) showed a high correlation with the experimental data (R 2 > 0.5). The pollutant removal rates in the case of M 1 were 0.19 m/d (COD Cr ) and those for M 3 were 25.2 g/m 2 ∙d for COD Cr and 2.63 g/m 2 ∙d for NH 4 -N. By applying a multi-variable linear regression method, the optimal empirical models were established for predicting the final effluent concentration of five days' biochemical oxygen demand (BOD 5 ) and NH 4 -N. In general, the hydraulic loading rate was considered an important variable having a high value of relative importance, which appeared in all the optimal predictive models. Copyright © 2018 Elsevier Ltd. All rights reserved.
Jardínez, Christiaan; Vela, Alberto; Cruz-Borbolla, Julián; Alvarez-Mendez, Rodrigo J; Alvarado-Rodríguez, José G
2016-12-01
The relationship between the chemical structure and biological activity (log IC 50 ) of 40 derivatives of 1,4-dihydropyridines (DHPs) was studied using density functional theory (DFT) and multiple linear regression analysis methods. With the aim of improving the quantitative structure-activity relationship (QSAR) model, the reduced density gradient s( r) of the optimized equilibrium geometries was used as a descriptor to include weak non-covalent interactions. The QSAR model highlights the correlation between the log IC 50 with highest molecular orbital energy (E HOMO ), molecular volume (V), partition coefficient (log P), non-covalent interactions NCI(H4-G) and the dual descriptor [Δf(r)]. The model yielded values of R 2 =79.57 and Q 2 =69.67 that were validated with the next four internal analytical validations DK=0.076, DQ=-0.006, R P =0.056, and R N =0.000, and the external validation Q 2 boot =64.26. The QSAR model found can be used to estimate biological activity with high reliability in new compounds based on a DHP series. Graphical abstract The good correlation between the log IC 50 with the NCI (H4-G) estimated by the reduced density gradient approach of the DHP derivatives.
NASA Technical Reports Server (NTRS)
Batterson, J. G.
1986-01-01
The successful parametric modeling of the aerodynamics for an airplane operating at high angles of attack or sideslip is performed in two phases. First the aerodynamic model structure must be determined and second the associated aerodynamic parameters (stability and control derivatives) must be estimated for that model. The purpose of this paper is to document two versions of a stepwise regression computer program which were developed for the determination of airplane aerodynamic model structure and to provide two examples of their use on computer generated data. References are provided for the application of the programs to real flight data. The two computer programs that are the subject of this report, STEP and STEPSPL, are written in FORTRAN IV (ANSI l966) compatible with a CDC FTN4 compiler. Both programs are adaptations of a standard forward stepwise regression algorithm. The purpose of the adaptation is to facilitate the selection of a adequate mathematical model of the aerodynamic force and moment coefficients of an airplane from flight test data. The major difference between STEP and STEPSPL is in the basis for the model. The basis for the model in STEP is the standard polynomial Taylor's series expansion of the aerodynamic function about some steady-state trim condition. Program STEPSPL utilizes a set of spline basis functions.
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
Forecasting volatility with neural regression: a contribution to model adequacy.
Refenes, A N; Holt, W T
2001-01-01
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
van Werkhoven, C H; van der Tempel, J; Jajou, R; Thijsen, S F T; Diepersloot, R J A; Bonten, M J M; Postma, D F; Oosterheert, J J
2015-08-01
To develop and validate a prediction model for Clostridium difficile infection (CDI) in hospitalized patients treated with systemic antibiotics, we performed a case-cohort study in a tertiary (derivation) and secondary care hospital (validation). Cases had a positive Clostridium test and were treated with systemic antibiotics before suspicion of CDI. Controls were randomly selected from hospitalized patients treated with systemic antibiotics. Potential predictors were selected from the literature. Logistic regression was used to derive the model. Discrimination and calibration of the model were tested in internal and external validation. A total of 180 cases and 330 controls were included for derivation. Age >65 years, recent hospitalization, CDI history, malignancy, chronic renal failure, use of immunosuppressants, receipt of antibiotics before admission, nonsurgical admission, admission to the intensive care unit, gastric tube feeding, treatment with cephalosporins and presence of an underlying infection were independent predictors of CDI. The area under the receiver operating characteristic curve of the model in the derivation cohort was 0.84 (95% confidence interval 0.80-0.87), and was reduced to 0.81 after internal validation. In external validation, consisting of 97 cases and 417 controls, the model area under the curve was 0.81 (95% confidence interval 0.77-0.85) and model calibration was adequate (Brier score 0.004). A simplified risk score was derived. Using a cutoff of 7 points, the positive predictive value, sensitivity and specificity were 1.0%, 72% and 73%, respectively. In conclusion, a risk prediction model was developed and validated, with good discrimination and calibration, that can be used to target preventive interventions in patients with increased risk of CDI. Copyright © 2015 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Shahtahmassebi, Amir Reza; Song, Jie; Zheng, Qing; Blackburn, George Alan; Wang, Ke; Huang, Ling Yan; Pan, Yi; Moore, Nathan; Shahtahmassebi, Golnaz; Sadrabadi Haghighi, Reza; Deng, Jing Song
2016-04-01
A substantial body of literature has accumulated on the topic of using remotely sensed data to map impervious surfaces which are widely recognized as an important indicator of urbanization. However, the remote sensing of impervious surface growth has not been successfully addressed. This study proposes a new framework for deriving and summarizing urban expansion and re-densification using time series of impervious surface fractions (ISFs) derived from remotely sensed imagery. This approach integrates multiple endmember spectral mixture analysis (MESMA), analysis of regression residuals, spatial statistics (Getis_Ord) and urban growth theories; hence, the framework is abbreviated as MRGU. The performance of MRGU was compared with commonly used change detection techniques in order to evaluate the effectiveness of the approach. The results suggested that the ISF regression residuals were optimal for detecting impervious surface changes while Getis_Ord was effective for mapping hotspot regions in the regression residuals image. Moreover, the MRGU outputs agreed with the mechanisms proposed in several existing urban growth theories, but importantly the outputs enable the refinement of such models by explicitly accounting for the spatial distribution of both expansion and re-densification mechanisms. Based on Landsat data, the MRGU is somewhat restricted in its ability to measure re-densification in the urban core but this may be improved through the use of higher spatial resolution satellite imagery. The paper ends with an assessment of the present gaps in remote sensing of impervious surface growth and suggests some solutions. The application of impervious surface fractions in urban change detection is a stimulating new research idea which is driving future research with new models and algorithms.
Lee, Jason; Morishima, Toshitaka; Kunisawa, Susumu; Sasaki, Noriko; Otsubo, Tetsuya; Ikai, Hiroshi; Imanaka, Yuichi
2013-01-01
Stroke and other cerebrovascular diseases are a major cause of death and disability. Predicting in-hospital mortality in ischaemic stroke patients can help to identify high-risk patients and guide treatment approaches. Chart reviews provide important clinical information for mortality prediction, but are laborious and limiting in sample sizes. Administrative data allow for large-scale multi-institutional analyses but lack the necessary clinical information for outcome research. However, administrative claims data in Japan has seen the recent inclusion of patient consciousness and disability information, which may allow more accurate mortality prediction using administrative data alone. The aim of this study was to derive and validate models to predict in-hospital mortality in patients admitted for ischaemic stroke using administrative data. The sample consisted of 21,445 patients from 176 Japanese hospitals, who were randomly divided into derivation and validation subgroups. Multivariable logistic regression models were developed using 7- and 30-day and overall in-hospital mortality as dependent variables. Independent variables included patient age, sex, comorbidities upon admission, Japan Coma Scale (JCS) score, Barthel Index score, modified Rankin Scale (mRS) score, and admissions after hours and on weekends/public holidays. Models were developed in the derivation subgroup, and coefficients from these models were applied to the validation subgroup. Predictive ability was analysed using C-statistics; calibration was evaluated with Hosmer-Lemeshow χ(2) tests. All three models showed predictive abilities similar or surpassing that of chart review-based models. The C-statistics were highest in the 7-day in-hospital mortality prediction model, at 0.906 and 0.901 in the derivation and validation subgroups, respectively. For the 30-day in-hospital mortality prediction models, the C-statistics for the derivation and validation subgroups were 0.893 and 0.872, respectively; in overall in-hospital mortality prediction these values were 0.883 and 0.876. In this study, we have derived and validated in-hospital mortality prediction models for three different time spans using a large population of ischaemic stroke patients in a multi-institutional analysis. The recent inclusion of JCS, Barthel Index, and mRS scores in Japanese administrative data has allowed the prediction of in-hospital mortality with accuracy comparable to that of chart review analyses. The models developed using administrative data had consistently high predictive abilities for all models in both the derivation and validation subgroups. These results have implications in the role of administrative data in future mortality prediction analyses. Copyright © 2013 S. Karger AG, Basel.
[Gaussian process regression and its application in near-infrared spectroscopy analysis].
Feng, Ai-Ming; Fang, Li-Min; Lin, Min
2011-06-01
Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.
Holan, S.H.; Davis, G.M.; Wildhaber, M.L.; DeLonay, A.J.; Papoulias, D.M.
2009-01-01
The timing of spawning in fish is tightly linked to environmental factors; however, these factors are not very well understood for many species. Specifically, little information is available to guide recruitment efforts for endangered species such as the sturgeon. Therefore, we propose a Bayesian hierarchical model for predicting the success of spawning of the shovelnose sturgeon which uses both biological and behavioural (longitudinal) data. In particular, we use data that were produced from a tracking study that was conducted in the Lower Missouri River. The data that were produced from this study consist of biological variables associated with readiness to spawn along with longitudinal behavioural data collected by using telemetry and archival data storage tags. These high frequency data are complex both biologically and in the underlying behavioural process. To accommodate such complexity we developed a hierarchical linear regression model that uses an eigenvalue predictor, derived from the transition probability matrix of a two-state Markov switching model with generalized auto-regressive conditional heteroscedastic dynamics. Finally, to minimize the computational burden that is associated with estimation of this model, a parallel computing approach is proposed. ?? Journal compilation 2009 Royal Statistical Society.
CATEGORICAL REGRESSION ANALYSIS OF ACUTE INHALATION TOXICITY DATA FOR HYDROGEN SULFIDE
Categorical regression is one of the tools offered by the U.S. EPA for derivation of acute reference exposures (AREs), which are dose-response assessments for acute exposures to inhaled chemicals. Categorical regression is used as a meta-analytical technique to calculate probabi...
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
NASA Astrophysics Data System (ADS)
Gorbunov, Michael E.; Kirchengast, Gottfried
2018-01-01
A new reference occultation processing system (rOPS) will include a Global Navigation Satellite System (GNSS) radio occultation (RO) retrieval chain with integrated uncertainty propagation. In this paper, we focus on wave-optics bending angle (BA) retrieval in the lower troposphere and introduce (1) an empirically estimated boundary layer bias (BLB) model then employed to reduce the systematic uncertainty of excess phases and bending angles in about the lowest 2 km of the troposphere and (2) the estimation of (residual) systematic uncertainties and their propagation together with random uncertainties from excess phase to bending angle profiles. Our BLB model describes the estimated bias of the excess phase transferred from the estimated bias of the bending angle, for which the model is built, informed by analyzing refractivity fluctuation statistics shown to induce such biases. The model is derived from regression analysis using a large ensemble of Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) RO observations and concurrent European Centre for Medium-Range Weather Forecasts (ECMWF) analysis fields. It is formulated in terms of predictors and adaptive functions (powers and cross products of predictors), where we use six main predictors derived from observations: impact altitude, latitude, bending angle and its standard deviation, canonical transform (CT) amplitude, and its fluctuation index. Based on an ensemble of test days, independent of the days of data used for the regression analysis to establish the BLB model, we find the model very effective for bias reduction and capable of reducing bending angle and corresponding refractivity biases by about a factor of 5. The estimated residual systematic uncertainty, after the BLB profile subtraction, is lower bounded by the uncertainty from the (indirect) use of ECMWF analysis fields but is significantly lower than the systematic uncertainty without BLB correction. The systematic and random uncertainties are propagated from excess phase to bending angle profiles, using a perturbation approach and the wave-optical method recently introduced by Gorbunov and Kirchengast (2015), starting with estimated excess phase uncertainties. The results are encouraging and this uncertainty propagation approach combined with BLB correction enables a robust reduction and quantification of the uncertainties of excess phases and bending angles in the lower troposphere.
The process and utility of classification and regression tree methodology in nursing research
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-01-01
Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Aulenbach, B. T.; Burns, D. A.; Shanley, J. B.; Yanai, R. D.; Bae, K.; Wild, A.; Yang, Y.; Dong, Y.
2013-12-01
There are many sources of uncertainty in estimates of streamwater solute flux. Flux is the product of discharge and concentration (summed over time), each of which has measurement uncertainty of its own. Discharge can be measured almost continuously, but concentrations are usually determined from discrete samples, which increases uncertainty dependent on sampling frequency and how concentrations are assigned for the periods between samples. Gaps between samples can be estimated by linear interpolation or by models that that use the relations between concentration and continuously measured or known variables such as discharge, season, temperature, and time. For this project, developed in cooperation with QUEST (Quantifying Uncertainty in Ecosystem Studies), we evaluated uncertainty for three flux estimation methods and three different sampling frequencies (monthly, weekly, and weekly plus event). The constituents investigated were dissolved NO3, Si, SO4, and dissolved organic carbon (DOC), solutes whose concentration dynamics exhibit strongly contrasting behavior. The evaluation was completed for a 10-year period at five small, forested watersheds in Georgia, New Hampshire, New York, Puerto Rico, and Vermont. Concentration regression models were developed for each solute at each of the three sampling frequencies for all five watersheds. Fluxes were then calculated using (1) a linear interpolation approach, (2) a regression-model method, and (3) the composite method - which combines the regression-model method for estimating concentrations and the linear interpolation method for correcting model residuals to the observed sample concentrations. We considered the best estimates of flux to be derived using the composite method at the highest sampling frequencies. We also evaluated the importance of sampling frequency and estimation method on flux estimate uncertainty; flux uncertainty was dependent on the variability characteristics of each solute and varied for different reporting periods (e.g. 10-year, study period vs. annually vs. monthly). The usefulness of the two regression model based flux estimation approaches was dependent upon the amount of variance in concentrations the regression models could explain. Our results can guide the development of optimal sampling strategies by weighing sampling frequency with improvements in uncertainty in stream flux estimates for solutes with particular characteristics of variability. The appropriate flux estimation method is dependent on a combination of sampling frequency and the strength of concentration regression models. Sites: Biscuit Brook (Frost Valley, NY), Hubbard Brook Experimental Forest and LTER (West Thornton, NH), Luquillo Experimental Forest and LTER (Luquillo, Puerto Rico), Panola Mountain (Stockbridge, GA), Sleepers River Research Watershed (Danville, VT)
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Tzeng, Jung-Ying; Zhang, Daowen; Pongpanich, Monnat; Smith, Chris; McCarthy, Mark I.; Sale, Michèle M.; Worrall, Bradford B.; Hsu, Fang-Chi; Thomas, Duncan C.; Sullivan, Patrick F.
2011-01-01
Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis. PMID:21835306