Building Regression Models: The Importance of Graphics.
ERIC Educational Resources Information Center
Dunn, Richard
1989-01-01
Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
ERIC Educational Resources Information Center
Lee, Wan-Fung; Bulcock, Jeffrey Wilson
The purposes of this study are: (1) to demonstrate the superiority of simple ridge regression over ordinary least squares regression through theoretical argument and empirical example; (2) to modify ridge regression through use of the variance normalization criterion; and (3) to demonstrate the superiority of simple ridge regression based on the…
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Determination of cellulose I crystallinity by FT-Raman spectroscopy
Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph
2009-01-01
Two new methods based on FT-Raman spectroscopy, one simple, based on band intensity ratio, and the other, using a partial least-squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in semicrystalline cellulose I samples was determined based on univariate regression that was first developed using the...
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
A Practical Model for Forecasting New Freshman Enrollment during the Application Period.
ERIC Educational Resources Information Center
Paulsen, Michael B.
1989-01-01
A simple and effective model for forecasting freshman enrollment during the application period is presented step by step. The model requires minimal and readily available information, uses a simple linear regression analysis on a personal computer, and provides updated monthly forecasts. (MSE)
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
NASA Astrophysics Data System (ADS)
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx concentrations from an independent data set from the EPA's Air Quality System (AQS) and MESA Air fixed site monitors, and performed very well (R²=0.82).
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph
2010-01-01
Two new methods based on FTâRaman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
Hewitt, Angela L.; Popa, Laurentiu S.; Pasalar, Siavash; Hendrix, Claudia M.
2011-01-01
Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of Radj2), followed by position (28 ± 24% of Radj2) and speed (11 ± 19% of Radj2). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower Radj2 values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics. PMID:21795616
Assistive Technologies for Second-Year Statistics Students Who Are Blind
ERIC Educational Resources Information Center
Erhardt, Robert J.; Shuman, Michael P.
2015-01-01
At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
An empirical model for estimating annual consumption by freshwater fish populations
Liao, H.; Pierce, C.L.; Larscheid, J.G.
2005-01-01
Population consumption is an important process linking predator populations to their prey resources. Simple tools are needed to enable fisheries managers to estimate population consumption. We assembled 74 individual estimates of annual consumption by freshwater fish populations and their mean annual population size, 41 of which also included estimates of mean annual biomass. The data set included 14 freshwater fish species from 10 different bodies of water. From this data set we developed two simple linear regression models predicting annual population consumption. Log-transformed population size explained 94% of the variation in log-transformed annual population consumption. Log-transformed biomass explained 98% of the variation in log-transformed annual population consumption. We quantified the accuracy of our regressions and three alternative consumption models as the mean percent difference from observed (bioenergetics-derived) estimates in a test data set. Predictions from our population-size regression matched observed consumption estimates poorly (mean percent difference = 222%). Predictions from our biomass regression matched observed consumption reasonably well (mean percent difference = 24%). The biomass regression was superior to an alternative model, similar in complexity, and comparable to two alternative models that were more complex and difficult to apply. Our biomass regression model, log10(consumption) = 0.5442 + 0.9962??log10(biomass), will be a useful tool for fishery managers, enabling them to make reasonably accurate annual population consumption predictions from mean annual biomass estimates. ?? Copyright by the American Fisheries Society 2005.
Modelling of capital asset pricing by considering the lagged effects
NASA Astrophysics Data System (ADS)
Sukono; Hidayat, Y.; Bon, A. Talib bin; Supian, S.
2017-01-01
In this paper the problem of modelling the Capital Asset Pricing Model (CAPM) with the effect of the lagged is discussed. It is assumed that asset returns are analysed influenced by the market return and the return of risk-free assets. To analyse the relationship between asset returns, the market return, and the return of risk-free assets, it is conducted by using a regression equation of CAPM, and regression equation of lagged distributed CAPM. Associated with the regression equation lagged CAPM distributed, this paper also developed a regression equation of Koyck transformation CAPM. Results of development show that the regression equation of Koyck transformation CAPM has advantages, namely simple as it only requires three parameters, compared with regression equation of lagged distributed CAPM.
Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression
ERIC Educational Resources Information Center
Wilcox, Rand R.
2006-01-01
Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.
2003-01-01
Regional equations for estimating 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood-peak discharges at ungaged sites on rural, unregulated streams in Ohio were developed by means of ordinary and generalized least-squares (GLS) regression techniques. One-variable, simple equations and three-variable, full-model equations were developed on the basis of selected basin characteristics and flood-frequency estimates determined for 305 streamflow-gaging stations in Ohio and adjacent states. The average standard errors of prediction ranged from about 39 to 49 percent for the simple equations, and from about 34 to 41 percent for the full-model equations. Flood-frequency estimates determined by means of log-Pearson Type III analyses are reported along with weighted flood-frequency estimates, computed as a function of the log-Pearson Type III estimates and the regression estimates. Values of explanatory variables used in the regression models were determined from digital spatial data sets by means of a geographic information system (GIS), with the exception of drainage area, which was determined by digitizing the area within basin boundaries manually delineated on topographic maps. Use of GIS-based explanatory variables represents a major departure in methodology from that described in previous reports on estimating flood-frequency characteristics of Ohio streams. Examples are presented illustrating application of the regression equations to ungaged sites on ungaged and gaged streams. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site on the same stream. A region-of-influence method, which employs a computer program to estimate flood-frequency characteristics for ungaged sites based on data from gaged sites with similar characteristics, was also tested and compared to the GLS full-model equations. For all recurrence intervals, the GLS full-model equations had superior prediction accuracy relative to the simple equations and therefore are recommended for use.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Bayesian isotonic density regression
Wang, Lianming; Dunson, David B.
2011-01-01
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259
Simple to complex modeling of breathing volume using a motion sensor.
John, Dinesh; Staudenmayer, John; Freedson, Patty
2013-06-01
To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.
Hewitt, Angela L; Popa, Laurentiu S; Pasalar, Siavash; Hendrix, Claudia M; Ebner, Timothy J
2011-11-01
Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of R(adj)(2)), followed by position (28 ± 24% of R(adj)(2)) and speed (11 ± 19% of R(adj)(2)). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower R(adj)(2) values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics.
Sun, Jianguo; Feng, Yanqin; Zhao, Hui
2015-01-01
Interval-censored failure time data occur in many fields including epidemiological and medical studies as well as financial and sociological studies, and many authors have investigated their analysis (Sun, The statistical analysis of interval-censored failure time data, 2006; Zhang, Stat Modeling 9:321-343, 2009). In particular, a number of procedures have been developed for regression analysis of interval-censored data arising from the proportional hazards model (Finkelstein, Biometrics 42:845-854, 1986; Huang, Ann Stat 24:540-568, 1996; Pan, Biometrics 56:199-203, 2000). For most of these procedures, however, one drawback is that they involve estimation of both regression parameters and baseline cumulative hazard function. In this paper, we propose two simple estimation approaches that do not need estimation of the baseline cumulative hazard function. The asymptotic properties of the resulting estimates are given, and an extensive simulation study is conducted and indicates that they work well for practical situations.
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Forecasting USAF JP-8 Fuel Needs
2009-03-01
versus complex ones. When we consider long -term forecasts, 5-years in this case, multiple regression outperforms ANN modeling within the specified...with more simple and easy-to-implement methods, versus complex ones. When we consider long -term 5-year forecasts, our multiple regression model...effort. The insight and experience was certainly appreciated. Special thanks to my Turkish peers for their continuous support and help during this long
Rosenblum, Michael; van der Laan, Mark J.
2010-01-01
Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
NASA Astrophysics Data System (ADS)
Aye, S. A.; Heyns, P. S.
2017-02-01
This paper proposes an optimal Gaussian process regression (GPR) for the prediction of remaining useful life (RUL) of slow speed bearings based on a novel degradation assessment index obtained from acoustic emission signal. The optimal GPR is obtained from an integration or combination of existing simple mean and covariance functions in order to capture the observed trend of the bearing degradation as well the irregularities in the data. The resulting integrated GPR model provides an excellent fit to the data and improves over the simple GPR models that are based on simple mean and covariance functions. In addition, it achieves a low percentage error prediction of the remaining useful life of slow speed bearings. These findings are robust under varying operating conditions such as loading and speed and can be applied to nonlinear and nonstationary machine response signals useful for effective preventive machine maintenance purposes.
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Howley, Donna; Howley, Peter; Oxenham, Marc F
2018-06-01
Stature and a further 8 anthropometric dimensions were recorded from the arms and hands of a sample of 96 staff and students from the Australian National University and The University of Newcastle, Australia. These dimensions were used to create simple and multiple logistic regression models for sex estimation and simple and multiple linear regression equations for stature estimation of a contemporary Australian population. Overall sex classification accuracies using the models created were comparable to similar studies. The stature estimation models achieved standard errors of estimates (SEE) which were comparable to and in many cases lower than those achieved in similar research. Generic, non sex-specific models achieved similar SEEs and R 2 values to the sex-specific models indicating stature may be accurately estimated when sex is unknown. Copyright © 2018 Elsevier B.V. All rights reserved.
Regression-based model of skin diffuse reflectance for skin color analysis
NASA Astrophysics Data System (ADS)
Tsumura, Norimichi; Kawazoe, Daisuke; Nakaguchi, Toshiya; Ojima, Nobutoshi; Miyake, Yoichi
2008-11-01
A simple regression-based model of skin diffuse reflectance is developed based on reflectance samples calculated by Monte Carlo simulation of light transport in a two-layered skin model. This reflectance model includes the values of spectral reflectance in the visible spectra for Japanese women. The modified Lambert Beer law holds in the proposed model with a modified mean free path length in non-linear density space. The averaged RMS and maximum errors of the proposed model were 1.1 and 3.1%, respectively, in the above range.
Meteorological adjustment of yearly mean values for air pollutant concentration comparison
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Neustadter, H. E.
1976-01-01
Using multiple linear regression analysis, models which estimate mean concentrations of Total Suspended Particulate (TSP), sulfur dioxide, and nitrogen dioxide as a function of several meteorologic variables, two rough economic indicators, and a simple trend in time are studied. Meteorologic data were obtained and do not include inversion heights. The goodness of fit of the estimated models is partially reflected by the squared coefficient of multiple correlation which indicates that, at the various sampling stations, the models accounted for about 23 to 47 percent of the total variance of the observed TSP concentrations. If the resulting model equations are used in place of simple overall means of the observed concentrations, there is about a 20 percent improvement in either: (1) predicting mean concentrations for specified meteorological conditions; or (2) adjusting successive yearly averages to allow for comparisons devoid of meteorological effects. An application to source identification is presented using regression coefficients of wind velocity predictor variables.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
An overview of longitudinal data analysis methods for neurological research.
Locascio, Joseph J; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Troutman, Brent M.
1982-01-01
Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Fischer, Thomas; Fischer, Susanne; Himmel, Wolfgang; Kochen, Michael M; Hummers-Pradier, Eva
2008-01-01
The influence of patient characteristics on family practitioners' (FPs') diagnostic decision making has mainly been investigated using indirect methods such as vignettes or questionnaires. Direct observation-borrowed from social and cultural anthropology-may be an alternative method for describing FPs' real-life behavior and may help in gaining insight into how FPs diagnose respiratory tract infections, which are frequent in primary care. To clarify FPs' diagnostic processes when treating patients suffering from symptoms of respiratory tract infection. This direct observation study was performed in 30 family practices using a checklist for patient complaints, history taking, physical examination, and diagnoses. The influence of patients' symptoms and complaints on the FPs' physical examination and diagnosis was calculated by logistic regression analyses. Dummy variables based on combinations of symptoms and complaints were constructed and tested against saturated (full) and backward regression models. In total, 273 patients (median age 37 years, 51% women) were included. The median number of symptoms described was 4 per patient, and most information was provided at the patients' own initiative. Multiple logistic regression analysis showed a strong association between patients' complaints and the physical examination. Frequent diagnoses were upper respiratory tract infection (URTI)/common cold (43%), bronchitis (26%), sinusitis (12%), and tonsillitis (11%). There were no significant statistical differences between "simple heuristic'' models and saturated regression models in the diagnoses of bronchitis, sinusitis, and tonsillitis, indicating that simple heuristics are probably used by the FPs, whereas "URTI/common cold'' was better explained by the full model. FPs tended to make their diagnosis based on a few patient symptoms and a limited physical examination. Simple heuristic models were almost as powerful in explaining most diagnoses as saturated models. Direct observation allowed for the study of decision making under real conditions, yielding both quantitative data and "qualitative'' information about the FPs' performance. It is important for investigators to be aware of the specific disadvantages of the method (e.g., a possible observer effect).
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.
An Overview of Longitudinal Data Analysis Methods for Neurological Research
Locascio, Joseph J.; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models. PMID:22203825
Regression Models for Identifying Noise Sources in Magnetic Resonance Images
Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.
2009-01-01
Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
Inverse models: A necessary next step in ground-water modeling
Poeter, E.P.; Hill, M.C.
1997-01-01
Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.
Various approaches and tools exist to estimate local and regional PM2.5 impacts from a single emissions source, ranging from simple screening techniques to Gaussian based dispersion models and complex grid-based Eulerian photochemical transport models. These approache...
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Computer Simulation of Human Service Program Evaluations.
ERIC Educational Resources Information Center
Trochim, William M. K.; Davis, James E.
1985-01-01
Describes uses of computer simulations for the context of human service program evaluation. Presents simple mathematical models for most commonly used human service outcome evaluation designs (pretest-posttest randomized experiment, pretest-posttest nonequivalent groups design, and regression-discontinuity design). Translates models into single…
Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro
2016-01-01
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952
Predicting acute pain after cesarean delivery using three simple questions.
Pan, Peter H; Tonidandel, Ashley M; Aschenbrenner, Carol A; Houle, Timothy T; Harris, Lynne C; Eisenach, James C
2013-05-01
Interindividual variability in postoperative pain presents a clinical challenge. Preoperative quantitative sensory testing is useful but time consuming in predicting postoperative pain intensity. The current study was conducted to develop and validate a predictive model of acute postcesarean pain using a simple three-item preoperative questionnaire. A total of 200 women scheduled for elective cesarean delivery under subarachnoid anesthesia were enrolled (192 subjects analyzed). Patients were asked to rate the intensity of loudness of audio tones, their level of anxiety and anticipated pain, and analgesic need from surgery. Postoperatively, patients reported the intensity of evoked pain. Regression analysis was performed to generate a predictive model for pain from these measures. A validation cohort of 151 women was enrolled to test the reliability of the model (131 subjects analyzed). Responses from each of the three preoperative questions correlated moderately with 24-h evoked pain intensity (r = 0.24-0.33, P < 0.001). Audio tone rating added uniquely, but minimally, to the model and was not included in the predictive model. The multiple regression analysis yielded a statistically significant model (R = 0.20, P < 0.001), whereas the validation cohort showed reliably a very similar regression line (R = 0.18). In predicting the upper 20th percentile of evoked pain scores, the optimal cut point was 46.9 (z =0.24) such that sensitivity of 0.68 and specificity of 0.67 were as balanced as possible. This simple three-item questionnaire is useful to help predict postcesarean evoked pain intensity, and could be applied to further research and clinical application to tailor analgesic therapy to those who need it most.
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Ameye, Lieveke; Fischerova, Daniela; Epstein, Elisabeth; Melis, Gian Benedetto; Guerriero, Stefano; Van Holsbeke, Caroline; Savelli, Luca; Fruscio, Robert; Lissoni, Andrea Alberto; Testa, Antonia Carla; Veldman, Joan; Vergote, Ignace; Van Huffel, Sabine; Bourne, Tom; Valentin, Lil
2010-01-01
Objectives To prospectively assess the diagnostic performance of simple ultrasound rules to predict benignity/malignancy in an adnexal mass and to test the performance of the risk of malignancy index, two logistic regression models, and subjective assessment of ultrasonic findings by an experienced ultrasound examiner in adnexal masses for which the simple rules yield an inconclusive result. Design Prospective temporal and external validation of simple ultrasound rules to distinguish benign from malignant adnexal masses. The rules comprised five ultrasonic features (including shape, size, solidity, and results of colour Doppler examination) to predict a malignant tumour (M features) and five to predict a benign tumour (B features). If one or more M features were present in the absence of a B feature, the mass was classified as malignant. If one or more B features were present in the absence of an M feature, it was classified as benign. If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive. Setting 19 ultrasound centres in eight countries. Participants 1938 women with an adnexal mass examined with ultrasound by the principal investigator at each centre with a standardised research protocol. Reference standard Histological classification of the excised adnexal mass as benign or malignant. Main outcome measures Diagnostic sensitivity and specificity. Results Of the 1938 patients with an adnexal mass, 1396 (72%) had benign tumours, 373 (19.2%) had primary invasive tumours, 111 (5.7%) had borderline malignant tumours, and 58 (3%) had metastatic tumours in the ovary. The simple rules yielded a conclusive result in 1501 (77%) masses, for which they resulted in a sensitivity of 92% (95% confidence interval 89% to 94%) and a specificity of 96% (94% to 97%). The corresponding sensitivity and specificity of subjective assessment were 91% (88% to 94%) and 96% (94% to 97%). In the 357 masses for which the simple rules yielded an inconclusive result and with available results of CA-125 measurements, the sensitivities were 89% (83% to 93%) for subjective assessment, 50% (42% to 58%) for the risk of malignancy index, 89% (83% to 93%) for logistic regression model 1, and 82% (75% to 87%) for logistic regression model 2; the corresponding specificities were 78% (72% to 83%), 84% (78% to 88%), 44% (38% to 51%), and 48% (42% to 55%). Use of the simple rules as a triage test and subjective assessment for those masses for which the simple rules yielded an inconclusive result gave a sensitivity of 91% (88% to 93%) and a specificity of 93% (91% to 94%), compared with a sensitivity of 90% (88% to 93%) and a specificity of 93% (91% to 94%) when subjective assessment was used in all masses. Conclusions The use of the simple rules has the potential to improve the management of women with adnexal masses. In adnexal masses for which the rules yielded an inconclusive result, subjective assessment of ultrasonic findings by an experienced ultrasound examiner was the most accurate diagnostic test; the risk of malignancy index and the two regression models were not useful. PMID:21156740
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Batterham, Philip J; Bunce, David; Mackinnon, Andrew J; Christensen, Helen
2014-01-01
very few studies have examined the association between intra-individual reaction time variability and subsequent mortality. Furthermore, the ability of simple measures of variability to predict mortality has not been compared with more complex measures. a prospective cohort study of 896 community-based Australian adults aged 70+ were interviewed up to four times from 1990 to 2002, with vital status assessed until June 2007. From this cohort, 770-790 participants were included in Cox proportional hazards regression models of survival. Vital status and time in study were used to conduct survival analyses. The mean reaction time and three measures of intra-individual reaction time variability were calculated separately across 20 trials of simple and choice reaction time tasks. Models were adjusted for a range of demographic, physical health and mental health measures. greater intra-individual simple reaction time variability, as assessed by the raw standard deviation (raw SD), coefficient of variation (CV) or the intra-individual standard deviation (ISD), was strongly associated with an increased hazard of all-cause mortality in adjusted Cox regression models. The mean reaction time had no significant association with mortality. intra-individual variability in simple reaction time appears to have a robust association with mortality over 17 years. Health professionals such as neuropsychologists may benefit in their detection of neuropathology by supplementing neuropsychiatric testing with the straightforward process of testing simple reaction time and calculating raw SD or CV.
A Comparison between Multiple Regression Models and CUN-BAE Equation to Predict Body Fat in Adults
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A.; Aguiló, Antoni
2015-01-01
Background Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Methods Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. Results The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). Conclusions There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF. PMID:25821960
A comparison between multiple regression models and CUN-BAE equation to predict body fat in adults.
Fuster-Parra, Pilar; Bennasar-Veny, Miquel; Tauler, Pedro; Yañez, Aina; López-González, Angel A; Aguiló, Antoni
2015-01-01
Because the accurate measure of body fat (BF) is difficult, several prediction equations have been proposed. The aim of this study was to compare different multiple regression models to predict BF, including the recently reported CUN-BAE equation. Multi regression models using body mass index (BMI) and body adiposity index (BAI) as predictors of BF will be compared. These models will be also compared with the CUN-BAE equation. For all the analysis a sample including all the participants and another one including only the overweight and obese subjects will be considered. The BF reference measure was made using Bioelectrical Impedance Analysis. The simplest models including only BMI or BAI as independent variables showed that BAI is a better predictor of BF. However, adding the variable sex to both models made BMI a better predictor than the BAI. For both the whole group of participants and the group of overweight and obese participants, using simple models (BMI, age and sex as variables) allowed obtaining similar correlations with BF as when the more complex CUN-BAE was used (ρ = 0:87 vs. ρ = 0:86 for the whole sample and ρ = 0:88 vs. ρ = 0:89 for overweight and obese subjects, being the second value the one for CUN-BAE). There are simpler models than CUN-BAE equation that fits BF as well as CUN-BAE does. Therefore, it could be considered that CUN-BAE overfits. Using a simple linear regression model, the BAI, as the only variable, predicts BF better than BMI. However, when the sex variable is introduced, BMI becomes the indicator of choice to predict BF.
Forecasting paratransit services demand : review and recommendations.
DOT National Transportation Integrated Search
2013-06-01
Travel demand forecasting tools for Floridas paratransit services are outdated, utilizing old national trip : generation rate generalities and simple linear regression models. In its guidance for the development of : mandated Transportation Disadv...
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Penalized regression procedures for variable selection in the potential outcomes framework
Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.
2015-01-01
A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185
1990-09-01
without the help from the DSXR staff. William Lyons, Charles Ramsey , and Martin Meeks went above and beyond to help complete this research. Special...develop a valid forecasting model that is significantly more accurate than the one presently used by DSXR and suggested the development and testing of a...method, Strom tested DSXR’s iterative linear regression forecasting technique by examining P1 in the simple regression equation to determine whether
Age estimation standards for a Western Australian population using the coronal pulp cavity index.
Karkhanis, Shalmira; Mack, Peter; Franklin, Daniel
2013-09-10
Age estimation is a vital aspect in creating a biological profile and aids investigators by narrowing down potentially matching identities from the available pool. In addition to routine casework, in the present global political scenario, age estimation in living individuals is required in cases of refugees, asylum seekers, human trafficking and to ascertain age of criminal responsibility. Thus robust methods that are simple, non-invasive and ethically viable are required. The aim of the present study is, therefore, to test the reliability and applicability of the coronal pulp cavity index method, for the purpose of developing age estimation standards for an adult Western Australian population. A total of 450 orthopantomograms (220 females and 230 males) of Australian individuals were analyzed. Crown and coronal pulp chamber heights were measured in the mandibular left and right premolars, and the first and second molars. These measurements were then used to calculate the tooth coronal index. Data was analyzed using paired sample t-tests to assess bilateral asymmetry followed by simple linear and multiple regressions to develop age estimation models. The most accurate age estimation based on simple linear regression model was with mandibular right first molar (SEE ±8.271 years). Multiple regression models improved age prediction accuracy considerably and the most accurate model was with bilateral first and second molars (SEE ±6.692 years). This study represents the first investigation of this method in a Western Australian population and our results indicate that the method is suitable for forensic application. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2016-08-31
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2018-01-01
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Miller, Justin B; Axelrod, Bradley N; Schutte, Christian
2012-01-01
The recent release of the Wechsler Memory Scale Fourth Edition contains many improvements from a theoretical and administration perspective, including demographic corrections using the Advanced Clinical Solutions. Although the administration time has been reduced from previous versions, a shortened version may be desirable in certain situations given practical time limitations in clinical practice. The current study evaluated two- and three-subtest estimations of demographically corrected Immediate and Delayed Memory index scores using both simple arithmetic prorating and regression models. All estimated values were significantly associated with observed index scores. Use of Lin's Concordance Correlation Coefficient as a measure of agreement showed a high degree of precision and virtually zero bias in the models, although the regression models showed a stronger association than prorated models. Regression-based models proved to be more accurate than prorated estimates with less dispersion around observed values, particularly when using three subtest regression models. Overall, the present research shows strong support for estimating demographically corrected index scores on the WMS-IV in clinical practice with an adequate performance using arithmetically prorated models and a stronger performance using regression models to predict index scores.
Detection of epistatic effects with logic regression and a classical linear regression model.
Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata
2014-02-01
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Guenole, Nigel; Brown, Anna
2014-01-01
We report a Monte Carlo study examining the effects of two strategies for handling measurement non-invariance – modeling and ignoring non-invariant items – on structural regression coefficients between latent variables measured with item response theory models for categorical indicators. These strategies were examined across four levels and three types of non-invariance – non-invariant loadings, non-invariant thresholds, and combined non-invariance on loadings and thresholds – in simple, partial, mediated and moderated regression models where the non-invariant latent variable occupied predictor, mediator, and criterion positions in the structural regression models. When non-invariance is ignored in the latent predictor, the focal group regression parameters are biased in the opposite direction to the difference in loadings and thresholds relative to the referent group (i.e., lower loadings and thresholds for the focal group lead to overestimated regression parameters). With criterion non-invariance, the focal group regression parameters are biased in the same direction as the difference in loadings and thresholds relative to the referent group. While unacceptable levels of parameter bias were confined to the focal group, bias occurred at considerably lower levels of ignored non-invariance than was previously recognized in referent and focal groups. PMID:25278911
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
Das, Rudra Narayan; Roy, Kunal; Popelier, Paul L A
2015-11-01
The present study explores the chemical attributes of diverse ionic liquids responsible for their cytotoxicity in a rat leukemia cell line (IPC-81) by developing predictive classification as well as regression-based mathematical models. Simple and interpretable descriptors derived from a two-dimensional representation of the chemical structures along with quantum topological molecular similarity indices have been used for model development, employing unambiguous modeling strategies that strictly obey the guidelines of the Organization for Economic Co-operation and Development (OECD) for quantitative structure-activity relationship (QSAR) analysis. The structure-toxicity relationships that emerged from both classification and regression-based models were in accordance with the findings of some previous studies. The models suggested that the cytotoxicity of ionic liquids is dependent on the cationic surfactant action, long alkyl side chains, cationic lipophilicity as well as aromaticity, the presence of a dialkylamino substituent at the 4-position of the pyridinium nucleus and a bulky anionic moiety. The models have been transparently presented in the form of equations, thus allowing their easy transferability in accordance with the OECD guidelines. The models have also been subjected to rigorous validation tests proving their predictive potential and can hence be used for designing novel and "greener" ionic liquids. The major strength of the present study lies in the use of a diverse and large dataset, use of simple reproducible descriptors and compliance with the OECD norms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya
2013-01-01
Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.
NASA Astrophysics Data System (ADS)
Shao, G.; Gallion, J.; Fei, S.
2016-12-01
Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.
Estimating Infiltration Rates for a Loessal Silt Loam Using Soil Properties
M. Dean Knighton
1978-01-01
Soil properties were related to infiltration rates as measured by single-ringsteady-head infiltometers. The properties showing strong simple correlations were identified. Regression models were developed to estimate infiltration rate from several soil properties. The best model gave fair agreement to measured rates at another location.
NASA Astrophysics Data System (ADS)
Ishizaki, N. N.; Dairaku, K.; Ueno, G.
2016-12-01
We have developed a statistical downscaling method for estimating probabilistic climate projection using CMIP5 multi general circulation models (GCMs). A regression model was established so that the combination of weights of GCMs reflects the characteristics of the variation of observations at each grid point. Cross validations were conducted to select GCMs and to evaluate the regression model to avoid multicollinearity. By using spatially high resolution observation system, we conducted statistically downscaled probabilistic climate projections with 20-km horizontal grid spacing. Root mean squared errors for monthly mean air surface temperature and precipitation estimated by the regression method were the smallest compared with the results derived from a simple ensemble mean of GCMs and a cumulative distribution function based bias correction method. Projected changes in the mean temperature and precipitation were basically similar to those of the simple ensemble mean of GCMs. Mean precipitation was generally projected to increase associated with increased temperature and consequent increased moisture content in the air. Weakening of the winter monsoon may affect precipitation decrease in some areas. Temperature increase in excess of 4 K was expected in most areas of Japan in the end of 21st century under RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm would increase around the Pacific side during the summer and the Japan Sea side during the winter season. This probabilistic climate projection based on the statistical method can be expected to bring useful information to the impact studies and risk assessments.
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
NASA Astrophysics Data System (ADS)
Soja, G.; Soja, A.-M.
This study tested the usefulness of extremely simple meteorological models for the prediction of ozone indices. The models were developed with the input parameters of daily maximum temperature and sunshine duration and are based on a data collection period of three years. For a rural environment in eastern Austria, the meteorological and ozone data of three summer periods have been used to develop functions to describe three ozone exposure indices (daily maximum, 7 h mean 9.00-16.00 h, accumulated ozone dose AOT40). Data sets for other years or stations not included in the development of the models were used as test data to validate the performance of the models. Generally, optimized regression models performed better than simplest linear models, especially in the case of AOT40. For the description of the summer period from May to September, the mean absolute daily differences between observed and calculated indices were 8±6 ppb for the maximum half hour mean value, 6±5 ppb for the 7 h mean and 41±40 ppb h for the AOT40. When the parameters were further optimized to describe individual months separately, the mean absolute residuals decreased by ⩽10%. Neural network models did not always perform better than the regression models. This is attributed to the low number of inputs in this comparison and to the simple architecture of these models (2-2-1). Further factorial analyses of those days when the residuals were higher than the mean plus one standard deviation should reveal possible reasons why the models did not perform well on certain days. It was observed that overestimations by the models mainly occurred on days with partly overcast, hazy or very windy conditions. Underestimations more frequently occurred on weekdays than on weekends. It is suggested that the application of this kind of meteorological model will be more successful in topographically homogeneous regions and in rural environments with relatively constant rates of emission and long-range transport of ozone precursors. Under conditions too demanding for advanced physico/chemical models, the presented models may offer useful alternatives to derive ecologically relevant ozone indices directly from meteorological parameters.
Analytics For Distracted Driver Behavior Modeling in Dilemma Zone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jan-Mou; Malikopoulos, Andreas; Thakur, Gautam
2014-01-01
In this paper, we present the results obtained and insights gained through the analysis of TRB contest data. We used exploratory analysis, regression, and clustering models for gaining insights into the driver behavior in a dilemma zone while driving under distraction. While simple exploratory analysis showed the distinguishing driver behavior patterns among different popu- lation groups in the dilemma zone, regression analysis showed statically signification relationships between groups of variables. In addition to analyzing the contest data, we have also looked into the possible impact of distracted driving on the fuel economy.
Predicting outcome in severe traumatic brain injury using a simple prognostic model.
Sobuwa, Simpiwe; Hartzenberg, Henry Benjamin; Geduld, Heike; Uys, Corrie
2014-06-17
Several studies have made it possible to predict outcome in severe traumatic brain injury (TBI) making it beneficial as an aid for clinical decision-making in the emergency setting. However, reliable predictive models are lacking for resource-limited prehospital settings such as those in developing countries like South Africa. To develop a simple predictive model for severe TBI using clinical variables in a South African prehospital setting. All consecutive patients admitted at two level-one centres in Cape Town, South Africa, for severe TBI were included. A binary logistic regression model was used, which included three predictor variables: oxygen saturation (SpO₂), Glasgow Coma Scale (GCS) and pupil reactivity. The Glasgow Outcome Scale was used to assess outcome on hospital discharge. A total of 74.4% of the outcomes were correctly predicted by the logistic regression model. The model demonstrated SpO₂ (p=0.019), GCS (p=0.001) and pupil reactivity (p=0.002) as independently significant predictors of outcome in severe TBI. Odds ratios of a good outcome were 3.148 (SpO₂ ≥ 90%), 5.108 (GCS 6 - 8) and 4.405 (pupils bilaterally reactive). This model is potentially useful for effective predictions of outcome in severe TBI.
Application of Support Vector Machine to Forex Monitoring
NASA Astrophysics Data System (ADS)
Kamruzzaman, Joarder; Sarker, Ruhul A.
Previous studies have demonstrated superior performance of artificial neural network (ANN) based forex forecasting models over traditional regression models. This paper applies support vector machines to build a forecasting model from the historical data using six simple technical indicators and presents a comparison with an ANN based model trained by scaled conjugate gradient (SCG) learning algorithm. The models are evaluated and compared on the basis of five commonly used performance metrics that measure closeness of prediction as well as correctness in directional change. Forecasting results of six different currencies against Australian dollar reveal superior performance of SVM model using simple linear kernel over ANN-SCG model in terms of all the evaluation metrics. The effect of SVM parameter selection on prediction performance is also investigated and analyzed.
Metz, Torri D; Stoddard, Gregory J; Henry, Erick; Jackson, Marc; Holmgren, Calla; Esplin, Sean
2013-09-01
To create a simple tool for predicting the likelihood of successful trial of labor after cesarean delivery (TOLAC) during the pregnancy after a primary cesarean delivery using variables available at the time of admission. Data for all deliveries at 14 regional hospitals over an 8-year period were reviewed. Women with one cesarean delivery and one subsequent delivery were included. Variables associated with successful VBAC were identified using multivariable logistic regression. Points were assigned to these characteristics, with weighting based on the coefficients in the regression model to calculate an integer VBAC score. The VBAC score was correlated with TOLAC success rate and was externally validated in an independent cohort using a logistic regression model. A total of 5,445 women met inclusion criteria. Of those women, 1,170 (21.5%) underwent TOLAC. Of the women who underwent trial of labor, 938 (80%) had a successful VBAC. A VBAC score was generated based on the Bishop score (cervical examination) at the time of admission, with points added for history of vaginal birth, age younger than 35 years, absence of recurrent indication, and body mass index less than 30. Women with a VBAC score less than 10 had a likelihood of TOLAC success less than 50%. Women with a VBAC score more than 16 had a TOLAC success rate more than 85%. The model performed well in an independent cohort with an area under the curve of 0.80 (95% confidence interval 0.76-0.84). Prediction of TOLAC success at the time of admission is highly dependent on the initial cervical examination. This simple VBAC score can be utilized when counseling women considering TOLAC. II.
Seasonal ENSO forecasting: Where does a simple model stand amongst other operational ENSO models?
NASA Astrophysics Data System (ADS)
Halide, Halmar
2017-01-01
We apply a simple linear multiple regression model called IndOzy for predicting ENSO up to 7 seasonal lead times. The model still used 5 (five) predictors of the past seasonal Niño 3.4 ENSO indices derived from chaos theory and it was rolling-validated to give a one-step ahead forecast. The model skill was evaluated against data from the season of May-June-July (MJJ) 2003 to November-December-January (NDJ) 2015/2016. There were three skill measures such as: Pearson correlation, RMSE, and Euclidean distance were used for forecast verification. The skill of this simple model was than compared to those of combined Statistical and Dynamical models compiled at the IRI (International Research Institute) website. It was found that the simple model was only capable of producing a useful ENSO prediction only up to 3 seasonal leads, while the IRI statistical and Dynamical model skill were still useful up to 4 and 6 seasonal leads, respectively. Even with its short-range seasonal prediction skills, however, the simple model still has a potential to give ENSO-derived tailored products such as probabilistic measures of precipitation and air temperature. Both meteorological conditions affect the presence of wild-land fire hot-spots in Sumatera and Kalimantan. It is suggested that to improve its long-range skill, the simple INDOZY model needs to incorporate a nonlinear model such as an artificial neural network technique.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
USDA-ARS?s Scientific Manuscript database
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect s...
NASA Astrophysics Data System (ADS)
Song, Seok-Jeong; Kim, Tae-Il; Kim, Youngmi; Nam, Hyoungsik
2018-05-01
Recently, a simple, sensitive, and low-cost fluorescent indicator has been proposed to determine water contents in organic solvents, drugs, and foodstuffs. The change of water content leads to the change of the indicator's fluorescence color under the ultra-violet (UV) light. Whereas the water content values could be estimated from the spectrum obtained by a bulky and expensive spectrometer in the previous research, this paper demonstrates a simple and low-cost camera-based water content measurement scheme with the same fluorescent water indicator. Water content is calculated over the range of 0-30% by quadratic polynomial regression models with color information extracted from the captured images of samples. Especially, several color spaces such as RGB, xyY, L∗a∗b∗, u‧v‧, HSV, and YCBCR have been investigated to establish the optimal color information features over both linear and nonlinear RGB data given by a camera before and after gamma correction. In the end, a 2nd order polynomial regression model along with HSV in a linear domain achieves the minimum mean square error of 1.06% for a 3-fold cross validation method. Additionally, the resultant water content estimation model is implemented and evaluated in an off-the-shelf Android-based smartphone.
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Indiana chronic disease management program risk stratification analysis.
Li, Jingjin; Holmes, Ann M; Rosenman, Marc B; Katz, Barry P; Downs, Stephen M; Murray, Michael D; Ackermann, Ronald T; Inui, Thomas S
2005-10-01
The objective of this study was to compare the ability of risk stratification models derived from administrative data to classify groups of patients for enrollment in a tailored chronic disease management program. This study included 19,548 Medicaid patients with chronic heart failure or diabetes in the Indiana Medicaid data warehouse during 2001 and 2002. To predict costs (total claims paid) in FY 2002, we considered candidate predictor variables available in FY 2001, including patient characteristics, the number and type of prescription medications, laboratory tests, pharmacy charges, and utilization of primary, specialty, inpatient, emergency department, nursing home, and home health care. We built prospective models to identify patients with different levels of expenditure. Model fit was assessed using R statistics, whereas discrimination was assessed using the weighted kappa statistic, predictive ratios, and the area under the receiver operating characteristic curve. We found a simple least-squares regression model in which logged total charges in FY 2002 were regressed on the log of total charges in FY 2001, the number of prescriptions filled in FY 2001, and the FY 2001 eligibility category, performed as well as more complex models. This simple 3-parameter model had an R of 0.30 and, in terms in classification efficiency, had a sensitivity of 0.57, a specificity of 0.90, an area under the receiver operator curve of 0.80, and a weighted kappa statistic of 0.51. This simple model based on readily available administrative data stratified Medicaid members according to predicted future utilization as well as more complicated models.
Black, L E; Brion, G M; Freitas, S J
2007-06-01
Predicting the presence of enteric viruses in surface waters is a complex modeling problem. Multiple water quality parameters that indicate the presence of human fecal material, the load of fecal material, and the amount of time fecal material has been in the environment are needed. This paper presents the results of a multiyear study of raw-water quality at the inlet of a potable-water plant that related 17 physical, chemical, and biological indices to the presence of enteric viruses as indicated by cytopathic changes in cell cultures. It was found that several simple, multivariate logistic regression models that could reliably identify observations of the presence or absence of total culturable virus could be fitted. The best models developed combined a fecal age indicator (the atypical coliform [AC]/total coliform [TC] ratio), the detectable presence of a human-associated sterol (epicoprostanol) to indicate the fecal source, and one of several fecal load indicators (the levels of Giardia species cysts, coliform bacteria, and coprostanol). The best fit to the data was found when the AC/TC ratio, the presence of epicoprostanol, and the density of fecal coliform bacteria were input into a simple, multivariate logistic regression equation, resulting in 84.5% and 78.6% accuracies for the identification of the presence and absence of total culturable virus, respectively. The AC/TC ratio was the most influential input variable in all of the models generated, but producing the best prediction required additional input related to the fecal source and the fecal load. The potential for replacing microbial indicators of fecal load with levels of coprostanol was proposed and evaluated by multivariate logistic regression modeling for the presence and absence of virus.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Asumadu-Sarkodie, Samuel; Owusu, Phebe Asantewaa
2017-03-01
In this study, the impact of energy, agriculture, macroeconomic and human-induced indicators on environmental pollution from 1971 to 2011 is investigated using the statistically inspired modification of partial least squares (SIMPLS) regression model. There was evidence of a linear relationship between energy, agriculture, macroeconomic and human-induced indicators and carbon dioxide emissions. Evidence from the SIMPLS regression shows that a 1% increase in crop production index will reduce carbon dioxide emissions by 0.71%. Economic growth increased by 1% will reduce carbon dioxide emissions by 0.46%, which means that an increase in Ghana's economic growth may lead to a reduction in environmental pollution. The increase in electricity production from hydroelectric sources by 1% will reduce carbon dioxide emissions by 0.30%; thus, increasing renewable energy sources in Ghana's energy portfolio will help mitigate carbon dioxide emissions. Increasing enteric emissions by 1% will increase carbon dioxide emissions by 4.22%, and a 1% increase in the nitrogen content of manure management will increase carbon dioxide emissions by 6.69%. The SIMPLS regression forecasting exhibited a 5% MAPE from the prediction of carbon dioxide emissions.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C
2011-09-01
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression
ERIC Educational Resources Information Center
Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.
2012-01-01
Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
Helping Students Assess the Relative Importance of Different Intermolecular Interactions
ERIC Educational Resources Information Center
Jasien, Paul G.
2008-01-01
A semi-quantitative model has been developed to estimate the relative effects of dispersion, dipole-dipole interactions, and H-bonding on the normal boiling points ("T[subscript b]") for a subset of simple organic systems. The model is based upon a statistical analysis using multiple linear regression on a series of straight-chain organic…
Sridhar, Upasana Manimegalai; Govindarajan, Anand; Rhinehart, R Russell
2016-01-01
This work reveals the applicability of a relatively new optimization technique, Leapfrogging, for both nonlinear regression modeling and a methodology for nonlinear model-predictive control. Both are relatively simple, yet effective. The application on a nonlinear, pilot-scale, shell-and-tube heat exchanger reveals practicability of the techniques. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Carter, Gregory A.; Spiering, Bruce A.
2000-01-01
The present study utilized regression analysis to identify: wavebands and band ratios within the 400-850 nm range that could be used to estimate total chlorophyll concentration with minimal error; and simple regression models that were most effective in estimating chlorophyll concentrations were measured for two broadleaved species, a broadleaved vine, a needle-leaved conifer, and a representative of the grass family.Overall, reflectance, transmittance, and absorptance corresponded most precisely with chlorophyll concentration at wavelengths near 700 nm, although regressions were strong as well in the 550-625 nm range.
NASA Astrophysics Data System (ADS)
de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.
2018-04-01
A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
NASA Astrophysics Data System (ADS)
Wübbeler, Gerd; Bodnar, Olha; Elster, Clemens
2018-02-01
Weighted least-squares estimation is commonly applied in metrology to fit models to measurements that are accompanied with quoted uncertainties. The weights are chosen in dependence on the quoted uncertainties. However, when data and model are inconsistent in view of the quoted uncertainties, this procedure does not yield adequate results. When it can be assumed that all uncertainties ought to be rescaled by a common factor, weighted least-squares estimation may still be used, provided that a simple correction of the uncertainty obtained for the estimated model is applied. We show that these uncertainties and credible intervals are robust, as they do not rely on the assumption of a Gaussian distribution of the data. Hence, common software for weighted least-squares estimation may still safely be employed in such a case, followed by a simple modification of the uncertainties obtained by that software. We also provide means of checking the assumptions of such an approach. The Bayesian regression procedure is applied to analyze the CODATA values for the Planck constant published over the past decades in terms of three different models: a constant model, a straight line model and a spline model. Our results indicate that the CODATA values may not have yet stabilized.
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
Simple models for estimating local removals of timber in the northeast
David N. Larsen; David A. Gansner
1975-01-01
Provides a practical method of estimating subregional removals of timber and demonstrates its application to a typical problem. Stepwise multiple regression analysis is used to develop equations for estimating removals of softwood, hardwood, and all timber from selected characteristics of socioeconomic structure.
Teaching the Concept of Breakdown Point in Simple Linear Regression.
ERIC Educational Resources Information Center
Chan, Wai-Sum
2001-01-01
Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Mi, Gu; Di, Yanming; Schafer, Daniel W
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
Application of thermal model for pan evaporation to the hydrology of a defined medium, the sponge
NASA Technical Reports Server (NTRS)
Trenchard, M. H.; Artley, J. A. (Principal Investigator)
1981-01-01
A technique is presented which estimates pan evaporation from the commonly observed values of daily maximum and minimum air temperatures. These two variables are transformed to saturation vapor pressure equivalents which are used in a simple linear regression model. The model provides reasonably accurate estimates of pan evaporation rates over a large geographic area. The derived evaporation algorithm is combined with precipitation to obtain a simple moisture variable. A hypothetical medium with a capacity of 8 inches of water is initialized at 4 inches. The medium behaves like a sponge: it absorbs all incident precipitation, with runoff or drainage occurring only after it is saturated. Water is lost from this simple system through evaporation just as from a Class A pan, but at a rate proportional to its degree of saturation. The contents of the sponge is a moisture index calculated from only the maximum and minium temperatures and precipitation.
Comparison of methods for the analysis of relatively simple mediation models.
Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W
2017-09-01
Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.
Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection
Goldsmith, Jeff; Huang, Lei; Crainiceanu, Ciprian M.
2013-01-01
We develop scalar-on-image regression models when images are registered multidimensional manifolds. We propose a fast and scalable Bayes inferential procedure to estimate the image coefficient. The central idea is the combination of an Ising prior distribution, which controls a latent binary indicator map, and an intrinsic Gaussian Markov random field, which controls the smoothness of the nonzero coefficients. The model is fit using a single-site Gibbs sampler, which allows fitting within minutes for hundreds of subjects with predictor images containing thousands of locations. The code is simple and is provided in less than one page in the Appendix. We apply this method to a neuroimaging study where cognitive outcomes are regressed on measures of white matter microstructure at every voxel of the corpus callosum for hundreds of subjects. PMID:24729670
A quantile regression model for failure-time data with time-dependent covariates
Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov
2017-01-01
Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534
Graphical Tools for Linear Structural Equation Modeling
2014-06-01
others. 4Kenny and Milan (2011) write, “Identification is perhaps the most difficult concept for SEM researchers to understand. We have seen SEM...model to using typical SEM software to determine model identifia- bility. Kenny and Milan (2011) list the following drawbacks: (i) If poor starting...the well known recursive and null rules (Bollen, 1989) and the regression rule (Kenny and Milan , 2011). A Simple Criterion for Identifying Individual
The measurement of linear frequency drift in oscillators
NASA Astrophysics Data System (ADS)
Barnes, J. A.
1985-04-01
A linear drift in frequency is an important element in most stochastic models of oscillator performance. Quartz crystal oscillators often have drifts in excess of a part in ten to the tenth power per day. Even commercial cesium beam devices often show drifts of a few parts in ten to the thirteenth per year. There are many ways to estimate the drift rates from data samples (e.g., regress the phase on a quadratic; regress the frequency on a linear; compute the simple mean of the first difference of frequency; use Kalman filters with a drift term as one element in the state vector; and others). Although most of these estimators are unbiased, they vary in efficiency (i.e., confidence intervals). Further, the estimation of confidence intervals using the standard analysis of variance (typically associated with the specific estimating technique) can give amazingly optimistic results. The source of these problems is not an error in, say, the regressions techniques, but rather the problems arise from correlations within the residuals. That is, the oscillator model is often not consistent with constraints on the analysis technique or, in other words, some specific analysis techniques are often inappropriate for the task at hand. The appropriateness of a specific analysis technique is critically dependent on the oscillator model and can often be checked with a simple whiteness test on the residuals.
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
NASA Astrophysics Data System (ADS)
Wiley, E. O.
2010-07-01
Relative motion studies of visual double stars can be investigated using least squares regression techniques and readily accessible programs such as Microsoft Excel and a calculator. Optical pairs differ from physical pairs under most geometries in both their simple scatter plots and their regression models. A step-by-step protocol for estimating the rectilinear elements of an optical pair is presented. The characteristics of physical pairs using these techniques are discussed.
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Daradkeh, T K; Karim, L
1994-01-01
To investigate the predictors of employment status of patients with DSM-III-R diagnosis, 55 patients were selected by a simple random technique from the main psychiatric clinic in Al Ain, United Arab Emirates. Structured and formal assessments were carried out to extract the potential predictors of outcome of schizophrenia. Logistic regression model revealed that being married, absence of schizoid personality, free or with minimum symptoms of the illness, later age of onset, and higher educational attainment were the most significant predictors of employment outcome. The implications of the results of this study are discussed in the text.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
A surrogate model for thermal characteristics of stratospheric airship
NASA Astrophysics Data System (ADS)
Zhao, Da; Liu, Dongxu; Zhu, Ming
2018-06-01
A simple and accurate surrogate model is extremely needed to reduce the analysis complexity of thermal characteristics for a stratospheric airship. In this paper, a surrogate model based on the Least Squares Support Vector Regression (LSSVR) is proposed. The Gravitational Search Algorithm (GSA) is used to optimize hyper parameters. A novel framework consisting of a preprocessing classifier and two regression models is designed to train the surrogate model. Various temperature datasets of the airship envelope and the internal gas are obtained by a three-dimensional transient model for thermal characteristics. Using these thermal datasets, two-factor and multi-factor surrogate models are trained and several comparison simulations are conducted. Results illustrate that the surrogate models based on LSSVR-GSA have good fitting and generalization abilities. The pre-treated classification strategy proposed in this paper plays a significant role in improving the accuracy of the surrogate model.
NASA Astrophysics Data System (ADS)
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-01
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R2 and pseudo R2 were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R2 ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R2 = 0.31), but there was still large variability between patients in R2. The R2 from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
Bradshaw, Tyler; Fu, Rau; Bowen, Stephen; Zhu, Jun; Forrest, Lisa; Jeraj, Robert
2015-07-07
Dose painting relies on the ability of functional imaging to identify resistant tumor subvolumes to be targeted for additional boosting. This work assessed the ability of FDG, FLT, and Cu-ATSM PET imaging to predict the locations of residual FDG PET in canine tumors following radiotherapy. Nineteen canines with spontaneous sinonasal tumors underwent PET/CT imaging with radiotracers FDG, FLT, and Cu-ATSM prior to hypofractionated radiotherapy. Therapy consisted of 10 fractions of 4.2 Gy to the sinonasal cavity with or without an integrated boost of 0.8 Gy to the GTV. Patients had an additional FLT PET/CT scan after fraction 2, a Cu-ATSM PET/CT scan after fraction 3, and follow-up FDG PET/CT scans after radiotherapy. Following image registration, simple and multiple linear and logistic voxel regressions were performed to assess how well pre- and mid-treatment PET imaging predicted post-treatment FDG uptake. R(2) and pseudo R(2) were used to assess the goodness of fits. For simple linear regression models, regression coefficients for all pre- and mid-treatment PET images were significantly positive across the population (P < 0.05). However, there was large variability among patients in goodness of fits: R(2) ranged from 0.00 to 0.85, with a median of 0.12. Results for logistic regression models were similar. Multiple linear regression models resulted in better fits (median R(2) = 0.31), but there was still large variability between patients in R(2). The R(2) from regression models for different predictor variables were highly correlated across patients (R ≈ 0.8), indicating tumors that were poorly predicted with one tracer were also poorly predicted by other tracers. In conclusion, the high inter-patient variability in goodness of fits indicates that PET was able to predict locations of residual tumor in some patients, but not others. This suggests not all patients would be good candidates for dose painting based on a single biological target.
Solar energy distribution over Egypt using cloudiness from Meteosat photos
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mosalam Shaltout, M.A.; Hassen, A.H.
1990-01-01
In Egypt, there are 10 ground stations for measuring the global solar radiation, and five stations for measuring the diffuse solar radiation. Every day at noon, the Meteorological Authority in Cairo receives three photographs of cloudiness over Egypt from the Meteosat satellite, one in the visible, and two in the infra-red bands (10.5-12.5 {mu}m) and (5.7-7.1 {mu}m). The monthly average cloudiness for 24 sites over Egypt are measured and calculated from Meteosat observations during the period 1985-1986. Correlation analysis between the cloudiness observed by Meteosat and global solar radiation measured from the ground stations is carried out. It is foundmore » that, the correlation coefficients are about 0.90 for the simple linear regression, and increase for the second and third degree regressions. Also, the correlation coefficients for the cloudiness with the diffuse solar radiation are about 0.80 for the simple linear regression, and increase for the second and third degree regression. Models and empirical relations for estimating the global and diffuse solar radiation from Meteosat cloudiness data over Egypt are deduced and tested. Seasonal maps for the global and diffuse radiation over Egypt are carried out.« less
Nonlinear Constitutive Modeling of Piezoelectric Ceramics
NASA Astrophysics Data System (ADS)
Xu, Jia; Li, Chao; Wang, Haibo; Zhu, Zhiwen
2017-12-01
Nonlinear constitutive modeling of piezoelectric ceramics is discussed in this paper. Van der Pol item is introduced to explain the simple hysteretic curve. Improved nonlinear difference items are used to interpret the hysteresis phenomena of piezoelectric ceramics. The fitting effect of the model on experimental data is proved by the partial least-square regression method. The results show that this method can describe the real curve well. The results of this paper are helpful to piezoelectric ceramics constitutive modeling.
Analysis of precision and accuracy in a simple model of machine learning
NASA Astrophysics Data System (ADS)
Lee, Julian
2017-12-01
Machine learning is a procedure where a model for the world is constructed from a training set of examples. It is important that the model should capture relevant features of the training set, and at the same time make correct prediction for examples not included in the training set. I consider the polynomial regression, the simplest method of learning, and analyze the accuracy and precision for different levels of the model complexity.
Estimating Causal Effects with Ancestral Graph Markov Models
Malinsky, Daniel; Spirtes, Peter
2017-01-01
We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the “IDA” procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validate the performance of our algorithm on simulated data and demonstrate improved precision over IDA when latent variables are present. PMID:28217244
NASA Astrophysics Data System (ADS)
Goswami, M.; O'Connor, K. M.; Shamseldin, A. Y.
The "Galway Real-Time River Flow Forecasting System" (GFFS) is a software pack- age developed at the Department of Engineering Hydrology, of the National University of Ireland, Galway, Ireland. It is based on a selection of lumped black-box and con- ceptual rainfall-runoff models, all developed in Galway, consisting primarily of both the non-parametric (NP) and parametric (P) forms of two black-box-type rainfall- runoff models, namely, the Simple Linear Model (SLM-NP and SLM-P) and the seasonally-based Linear Perturbation Model (LPM-NP and LPM-P), together with the non-parametric wetness-index-based Linearly Varying Gain Factor Model (LVGFM), the black-box Artificial Neural Network (ANN) Model, and the conceptual Soil Mois- ture Accounting and Routing (SMAR) Model. Comprised of the above suite of mod- els, the system enables the user to calibrate each model individually, initially without updating, and it is capable also of producing combined (i.e. consensus) forecasts us- ing the Simple Average Method (SAM), the Weighted Average Method (WAM), or the Artificial Neural Network Method (NNM). The updating of each model output is achieved using one of four different techniques, namely, simple Auto-Regressive (AR) updating, Linear Transfer Function (LTF) updating, Artificial Neural Network updating (NNU), and updating by the Non-linear Auto-Regressive Exogenous-input method (NARXM). The models exhibit a considerable range of variation in degree of complexity of structure, with corresponding degrees of complication in objective func- tion evaluation. Operating in continuous river-flow simulation and updating modes, these models and techniques have been applied to two Irish catchments, namely, the Fergus and the Brosna. A number of performance evaluation criteria have been used to comparatively assess the model discharge forecast efficiency.
Marcus V. Warwell; Gerald E. Rehfeldt; Nicholas L. Crookston
2006-01-01
The Random Forests multiple regression tree was used to develop an empirically-based bioclimate model for the distribution of Pinus albicaulis (whitebark pine) in western North America, latitudes 31° to 51° N and longitudes 102° to 125° W. Independent variables included 35 simple expressions of temperature and precipitation and their interactions....
1991-09-01
However, there is no guarantee that this would work; for instance if the data were generated by an ARCH model (Tong, 1990 pp. 116-117) then a simple...Hill, R., Griffiths, W., Lutkepohl, H., and Lee, T., Introduction to the Theory and Practice of Econometrics , 2th ed., Wiley, 1985. Kendall, M., Stuart
Krider, Lori A.; Magner, Joseph A.; Perry, Jim; Vondracek, Bruce C.; Ferrington, Leonard C.
2013-01-01
Carbonate-sandstone geology in southeastern Minnesota creates a heterogeneous landscape of springs, seeps, and sinkholes that supply groundwater into streams. Air temperatures are effective predictors of water temperature in surface-water dominated streams. However, no published work investigates the relationship between air and water temperatures in groundwater-fed streams (GWFS) across watersheds. We used simple linear regressions to examine weekly air-water temperature relationships for 40 GWFS in southeastern Minnesota. A 40-stream, composite linear regression model has a slope of 0.38, an intercept of 6.63, and R2 of 0.83. The regression models for GWFS have lower slopes and higher intercepts in comparison to surface-water dominated streams. Regression models for streams with high R2 values offer promise for use as predictive tools for future climate conditions. Climate change is expected to alter the thermal regime of groundwater-fed systems, but will do so at a slower rate than surface-water dominated systems. A regression model of intercept vs. slope can be used to identify streams for which water temperatures are more meteorologically than groundwater controlled, and thus more vulnerable to climate change. Such relationships can be used to guide restoration vs. management strategies to protect trout streams.
Modeling thermal sensation in a Mediterranean climate—a comparison of linear and ordinal models
NASA Astrophysics Data System (ADS)
Pantavou, Katerina; Lykoudis, Spyridon
2014-08-01
A simple thermo-physiological model of outdoor thermal sensation adjusted with psychological factors is developed aiming to predict thermal sensation in Mediterranean climates. Microclimatic measurements simultaneously with interviews on personal and psychological conditions were carried out in a square, a street canyon and a coastal location of the greater urban area of Athens, Greece. Multiple linear and ordinal regression were applied in order to estimate thermal sensation making allowance for all the recorded parameters or specific, empirically selected, subsets producing so-called extensive and empirical models, respectively. Meteorological, thermo-physiological and overall models - considering psychological factors as well - were developed. Predictions were improved when personal and psychological factors were taken into account as compared to meteorological models. The model based on ordinal regression reproduced extreme values of thermal sensation vote more adequately than the linear regression one, while the empirical model produced satisfactory results in relation to the extensive model. The effects of adaptation and expectation on thermal sensation vote were introduced in the models by means of the exposure time, season and preference related to air temperature and irradiation. The assessment of thermal sensation could be a useful criterion in decision making regarding public health, outdoor spaces planning and tourism.
Testing a single regression coefficient in high dimensional linear models
Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2017-01-01
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668
Testing a single regression coefficient in high dimensional linear models.
Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling
2016-11-01
In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.
A Semiparametric Change-Point Regression Model for Longitudinal Observations.
Xing, Haipeng; Ying, Zhiliang
2012-12-01
Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.
Evaluation of the CEAS model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
Di Donato, Violante; Kontopantelis, Evangelos; Aletti, Giovanni; Casorelli, Assunta; Piacenti, Ilaria; Bogani, Giorgio; Lecce, Francesca; Benedetti Panici, Pierluigi
2017-06-01
Primary cytoreductive surgery (PDS) followed by platinum-based chemotherapy is the cornerstone of treatment and the absence of residual tumor after PDS is universally considered the most important prognostic factor. The aim of the present analysis was to evaluate trend and predictors of 30-day mortality in patients undergoing primary cytoreduction for ovarian cancer. Literature was searched for records reporting 30-day mortality after PDS. All cohorts were rated for quality. Simple and multiple Poisson regression models were used to quantify the association between 30-day mortality and the following: overall or severe complications, proportion of patients with stage IV disease, median age, year of publication, and weighted surgical complexity index. Using the multiple regression model, we calculated the risk of perioperative mortality at different levels for statistically significant covariates of interest. Simple regression identified median age and proportion of patients with stage IV disease as statistically significant predictors of 30-day mortality. When included in the multiple Poisson regression model, both remained statistically significant, with an incidence rate ratio of 1.087 for median age and 1.017 for stage IV disease. Disease stage was a strong predictor, with the risk estimated to increase from 2.8% (95% confidence interval 2.02-3.66) for stage III to 16.1% (95% confidence interval 6.18-25.93) for stage IV, for a cohort with a median age of 65 years. Metaregression demonstrated that increased age and advanced clinical stage were independently associated with an increased risk of mortality, and the combined effects of both factors greatly increased the risk.
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Shillcutt, Samuel D; LeFevre, Amnesty E; Fischer-Walker, Christa L; Taneja, Sunita; Black, Robert E; Mazumder, Sarmila
2017-01-01
This study evaluates the cost-effectiveness of the DAZT program for scaling up treatment of acute child diarrhea in Gujarat India using a net-benefit regression framework. Costs were calculated from societal and caregivers' perspectives and effectiveness was assessed in terms of coverage of zinc and both zinc and Oral Rehydration Salt. Regression models were tested in simple linear regression, with a specified set of covariates, and with a specified set of covariates and interaction terms using linear regression with endogenous treatment effects was used as the reference case. The DAZT program was cost-effective with over 95% certainty above $5.50 and $7.50 per appropriately treated child in the unadjusted and adjusted models respectively, with specifications including interaction terms being cost-effective with 85-97% certainty. Findings from this study should be combined with other evidence when considering decisions to scale up programs such as the DAZT program to promote the use of ORS and zinc to treat child diarrhea.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
NASA Astrophysics Data System (ADS)
Li, Wang; Niu, Zheng; Gao, Shuai; Wang, Cheng
2014-11-01
Light Detection and Ranging (LiDAR) and Synthetic Aperture Radar (SAR) are two competitive active remote sensing techniques in forest above ground biomass estimation, which is important for forest management and global climate change study. This study aims to further explore their capabilities in temperate forest above ground biomass (AGB) estimation by emphasizing the spatial auto-correlation of variables obtained from these two remote sensing tools, which is a usually overlooked aspect in remote sensing applications to vegetation studies. Remote sensing variables including airborne LiDAR metrics, backscattering coefficient for different SAR polarizations and their ratio variables for Radarsat-2 imagery were calculated. First, simple linear regression models (SLR) was established between the field-estimated above ground biomass and the remote sensing variables. Pearson's correlation coefficient (R2) was used to find which LiDAR metric showed the most significant correlation with the regression residuals and could be selected as co-variable in regression co-kriging (RCoKrig). Second, regression co-kriging was conducted by choosing the regression residuals as dependent variable and the LiDAR metric (Hmean) with highest R2 as co-variable. Third, above ground biomass over the study area was estimated using SLR model and RCoKrig model, respectively. The results for these two models were validated using the same ground points. Results showed that both of these two methods achieved satisfactory prediction accuracy, while regression co-kriging showed the lower estimation error. It is proved that regression co-kriging model is feasible and effective in mapping the spatial pattern of AGB in the temperate forest using Radarsat-2 data calibrated by airborne LiDAR metrics.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
Using the PLUM procedure of SPSS to fit unequal variance and generalized signal detection models.
DeCarlo, Lawrence T
2003-02-01
The recent addition of aprocedure in SPSS for the analysis of ordinal regression models offers a simple means for researchers to fit the unequal variance normal signal detection model and other extended signal detection models. The present article shows how to implement the analysis and how to interpret the SPSS output. Examples of fitting the unequal variance normal model and other generalized signal detection models are given. The approach offers a convenient means for applying signal detection theory to a variety of research.
NASA Technical Reports Server (NTRS)
Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.
2010-01-01
The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Lawrence R. Gering; Dennis M. May
1995-01-01
A set of simple linear regression models for predicting diameter at breast height (dbh) from crown diamter and a set of similar models for predicting crown diamter from dbh were developed for four species groups in Harding County, TN. Data were obtained from 557 trees measured during hte 1989 USDA Southern Forest Experiment Station survey of the forest of Tennessee,...
Popa, Laurentiu S.; Hewitt, Angela L.; Ebner, Timothy J.
2012-01-01
The cerebellum has been implicated in processing motor errors required for online control of movement and motor learning. The dominant view is that Purkinje cell complex spike discharge signals motor errors. This study investigated whether errors are encoded in the simple spike discharge of Purkinje cells in monkeys trained to manually track a pseudo-randomly moving target. Four task error signals were evaluated based on cursor movement relative to target movement. Linear regression analyses based on firing residuals ensured that the modulation with a specific error parameter was independent of the other error parameters and kinematics. The results demonstrate that simple spike firing in lobules IV–VI is significantly correlated with position, distance and directional errors. Independent of the error signals, the same Purkinje cells encode kinematics. The strongest error modulation occurs at feedback timing. However, in 72% of cells at least one of the R2 temporal profiles resulting from regressing firing with individual errors exhibit two peak R2 values. For these bimodal profiles, the first peak is at a negative τ (lead) and a second peak at a positive τ (lag), implying that Purkinje cells encode both prediction and feedback about an error. For the majority of the bimodal profiles, the signs of the regression coefficients or preferred directions reverse at the times of the peaks. The sign reversal results in opposing simple spike modulation for the predictive and feedback components. Dual error representations may provide the signals needed to generate sensory prediction errors used to update a forward internal model. PMID:23115173
Tani, Yuji; Ogasawara, Katsuhiko
2012-01-01
This study aimed to contribute to the management of a healthcare organization by providing management information using time-series analysis of business data accumulated in the hospital information system, which has not been utilized thus far. In this study, we examined the performance of the prediction method using the auto-regressive integrated moving-average (ARIMA) model, using the business data obtained at the Radiology Department. We made the model using the data used for analysis, which was the number of radiological examinations in the past 9 years, and we predicted the number of radiological examinations in the last 1 year. Then, we compared the actual value with the forecast value. We were able to establish that the performance prediction method was simple and cost-effective by using free software. In addition, we were able to build the simple model by pre-processing the removal of trend components using the data. The difference between predicted values and actual values was 10%; however, it was more important to understand the chronological change rather than the individual time-series values. Furthermore, our method was highly versatile and adaptable compared to the general time-series data. Therefore, different healthcare organizations can use our method for the analysis and forecasting of their business data.
Wolters, Mark A; Dean, C B
2017-01-01
Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.
USDA-ARS?s Scientific Manuscript database
Net Primary Production (NPP), the difference between CO2 fixed by photosynthesis and CO2 lost to autotrophic respiration, is one of the most important components of the carbon cycle. Our goal was to develop a simple regression model to estimate global NPP using climate and land cover data. Approxima...
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This paper considers the problem of analysis of correlation coefficients from a multivariate normal population. A unified theorem is derived for the regression model with normally distributed explanatory variables and the general results are employed to provide useful expressions for the distributions of simple, multiple, and partial-multiple…
M.A. Mohamed; H.C. Coppel; J.D. Podgwaite; W.D. Rollinson
1983-01-01
Disease-free larvae of Neodiprion sertifer (Geoffroy) treated with its nucleopolyhedrosis virus in the field and under laboratory conditions showed a high correlation between virus accumulation and body weight. Simple linear regression models were found to fit viral accumulation versus body weight under either circumstance.
Models for forecasting hospital bed requirements in the acute sector.
Farmer, R D; Emami, J
1990-01-01
STUDY OBJECTIVE--The aim was to evaluate the current approach to forecasting hospital bed requirements. DESIGN--The study was a time series and regression analysis. The time series for mean duration of stay for general surgery in the age group 15-44 years (1969-1982) was used in the evaluation of different methods of forecasting future values of mean duration of stay and its subsequent use in the formation of hospital bed requirements. RESULTS--It has been suggested that the simple trend fitting approach suffers from model specification error and imposes unjustified restrictions on the data. Time series approach (Box-Jenkins method) was shown to be a more appropriate way of modelling the data. CONCLUSION--The simple trend fitting approach is inferior to the time series approach in modelling hospital bed requirements. PMID:2277253
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Optimizing separate phase light hydrocarbon recovery from contaminated unconfined aquifers
NASA Astrophysics Data System (ADS)
Cooper, Grant S.; Peralta, Richard C.; Kaluarachchi, Jagath J.
A modeling approach is presented that optimizes separate phase recovery of light non-aqueous phase liquids (LNAPL) for a single dual-extraction well in a homogeneous, isotropic unconfined aquifer. A simulation/regression/optimization (S/R/O) model is developed to predict, analyze, and optimize the oil recovery process. The approach combines detailed simulation, nonlinear regression, and optimization. The S/R/O model utilizes nonlinear regression equations describing system response to time-varying water pumping and oil skimming. Regression equations are developed for residual oil volume and free oil volume. The S/R/O model determines optimized time-varying (stepwise) pumping rates which minimize residual oil volume and maximize free oil recovery while causing free oil volume to decrease a specified amount. This S/R/O modeling approach implicitly immobilizes the free product plume by reversing the water table gradient while achieving containment. Application to a simple representative problem illustrates the S/R/O model utility for problem analysis and remediation design. When compared with the best steady pumping strategies, the optimal stepwise pumping strategy improves free oil recovery by 11.5% and reduces the amount of residual oil left in the system due to pumping by 15%. The S/R/O model approach offers promise for enhancing the design of free phase LNAPL recovery systems and to help in making cost-effective operation and management decisions for hydrogeologists, engineers, and regulators.
THE DISTRIBUTION OF COOK’S D STATISTIC
Muller, Keith E.; Mok, Mario Chen
2013-01-01
Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. PMID:24363487
Bayesian median regression for temporal gene expression data
NASA Astrophysics Data System (ADS)
Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.
2007-09-01
Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E
2017-07-01
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.
Reliability Analysis of the Gradual Degradation of Semiconductor Devices.
1983-07-20
under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Pistonesi, Marcelo F; Di Nezio, María S; Centurión, María E; Lista, Adriana G; Fragoso, Wallace D; Pontes, Márcio J C; Araújo, Mário C U; Band, Beatriz S Fernández
2010-12-15
In this study, a novel, simple, and efficient spectrofluorimetric method to determine directly and simultaneously five phenolic compounds (hydroquinone, resorcinol, phenol, m-cresol and p-cresol) in air samples is presented. For this purpose, variable selection by the successive projections algorithm (SPA) is used in order to obtain simple multiple linear regression (MLR) models based on a small subset of wavelengths. For comparison, partial least square (PLS) regression is also employed in full-spectrum. The concentrations of the calibration matrix ranged from 0.02 to 0.2 mg L(-1) for hydroquinone, from 0.05 to 0.6 mg L(-1) for resorcinol, and from 0.05 to 0.4 mg L(-1) for phenol, m-cresol and p-cresol; incidentally, such ranges are in accordance with the Argentinean environmental legislation. To verify the accuracy of the proposed method a recovery study on real air samples of smoking environment was carried out with satisfactory results (94-104%). The advantage of the proposed method is that it requires only spectrofluorimetric measurements of samples and chemometric modeling for simultaneous determination of five phenols. With it, air is simply sampled and no pre-treatment sample is needed (i.e., separation steps and derivatization reagents are avoided) that means a great saving of time. Copyright © 2010 Elsevier B.V. All rights reserved.
Changes in Clavicle Length and Maturation in Americans: 1840-1980.
Langley, Natalie R; Cridlin, Sandra
2016-01-01
Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
NASA Astrophysics Data System (ADS)
Balidoy Baloloy, Alvin; Conferido Blanco, Ariel; Gumbao Candido, Christian; Labadisos Argamosa, Reginal Jay; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico
2018-04-01
Aboveground biomass estimation (AGB) is essential in determining the environmental and economic values of mangrove forests. Biomass prediction models can be developed through integration of remote sensing, field data and statistical models. This study aims to assess and compare the biomass predictor potential of multispectral bands, vegetation indices and biophysical variables that can be derived from three optical satellite systems: the Sentinel-2 with 10 m, 20 m and 60 m resolution; RapidEye with 5m resolution and PlanetScope with 3m ground resolution. Field data for biomass were collected from a Rhizophoraceae-dominated mangrove forest in Masinloc, Zambales, Philippines where 30 test plots (1.2 ha) and 5 validation plots (0.2 ha) were established. Prior to the generation of indices, images from the three satellite systems were pre-processed using atmospheric correction tools in SNAP (Sentinel-2), ENVI (RapidEye) and python (PlanetScope). The major predictor bands tested are Blue, Green and Red, which are present in the three systems; and Red-edge band from Sentinel-2 and Rapideye. The tested vegetation index predictors are Normalized Differenced Vegetation Index (NDVI), Soil-adjusted Vegetation Index (SAVI), Green-NDVI (GNDVI), Simple Ratio (SR), and Red-edge Simple Ratio (SRre). The study generated prediction models through conventional linear regression and multivariate regression. Higher coefficient of determination (r2) values were obtained using multispectral band predictors for Sentinel-2 (r2 = 0.89) and Planetscope (r2 = 0.80); and vegetation indices for RapidEye (r2 = 0.92). Multivariate Adaptive Regression Spline (MARS) models performed better than the linear regression models with r2 ranging from 0.62 to 0.92. Based on the r2 and root-mean-square errors (RMSE's), the best biomass prediction model per satellite were chosen and maps were generated. The accuracy of predicted biomass maps were high for both Sentinel-2 (r2 = 0.92) and RapidEye data (r2 = 0.91).
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby
2017-01-01
Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Developing a Model for Forecasting Road Traffic Accident (RTA) Fatalities in Yemen
NASA Astrophysics Data System (ADS)
Karim, Fareed M. A.; Abdo Saleh, Ali; Taijoobux, Aref; Ševrović, Marko
2017-12-01
The aim of this paper is to develop a model for forecasting RTA fatalities in Yemen. The yearly fatalities was modeled as the dependent variable, while the number of independent variables included the population, number of vehicles, GNP, GDP and Real GDP per capita. It was determined that all these variables are highly correlated with the correlation coefficient (r ≈ 0.9); in order to avoid multicollinearity in the model, a single variable with the highest r value was selected (real GDP per capita). A simple regression model was developed; the model was very good (R2=0.916); however, the residuals were serially correlated. The Prais-Winsten procedure was used to overcome this violation of the regression assumption. The data for a 20-year period from 1991-2010 were analyzed to build the model; the model was validated by using data for the years 2011-2013; the historical fit for the period 1991 - 2011 was very good. Also, the validation for 2011-2013 proved accurate.
Protocol for monitoring standing crop in grasslands using visual obstruction
Lakhdar Benkobi; Daniel W. Uresk; Greg Schenbeck; Rudy M. King
2000-01-01
Assessment of standing crop on grasslands using a visual obstruction technique provides valuable information to help plan livestock grazing management and indicate the status of wildlife habitat. The objectives of this study were to: (1) develop a simple regression model using easily measured visual obstruction to estimate standing crop on sandy lowland range sites in...
Entrainment regimes and flame characteristics of wildland fires
Ralph M. Nelson; Bret W. Butler; David R. Weise
2012-01-01
This paper reports results from a study of the flame characteristics of 22 wind-aided pine litter fires in a laboratory wind tunnel and 32 field fires in southern rough and litter-grass fuels. Flame characteristic and fire behaviour data from these fires, simple theoretical flame models and regression techniques are used to determine whether the data support the...
Mark Chopping; Anne Nolin; Gretchen G. Moisen; John V. Martonchik; Michael Bull
2009-01-01
In this study retrievals of forest canopy height were obtained through adjustment of a simple geometricoptical (GO) model against red band surface bidirectional reflectance estimates from NASA's Multiangle Imaging SpectroRadiometer (MISR), mapped to a 250 m grid. The soil-understory background contribution was partly isolated prior to inversion using regression...
The impact of potential political security level on international tourism
Young-Rae Kim; Chang Huh; Seung Hyun Kim
2002-01-01
The purpose of this study was to investigate the impact of potential political security in an effort to fill in two foregoing research gaps in international tourism. To investigate the relationship between political security and international tourism, a simple regression model was employed. Secondary data were collected from a variety of sources, such as international...
Design of a global soil moisture initialization procedure for the simple biosphere model
NASA Technical Reports Server (NTRS)
Liston, G. E.; Sud, Y. C.; Walker, G. K.
1993-01-01
Global soil moisture and land-surface evapotranspiration fields are computed using an analysis scheme based on the Simple Biosphere (SiB) soil-vegetation-atmosphere interaction model. The scheme is driven with observed precipitation, and potential evapotranspiration, where the potential evapotranspiration is computed following the surface air temperature-potential evapotranspiration regression of Thomthwaite (1948). The observed surface air temperature is corrected to reflect potential (zero soil moisture stress) conditions by letting the ratio of actual transpiration to potential transpiration be a function of normalized difference vegetation index (NDVI). Soil moisture, evapotranspiration, and runoff data are generated on a daily basis for a 10-year period, January 1979 through December 1988, using observed precipitation gridded at a 4 deg by 5 deg resolution.
NASA Technical Reports Server (NTRS)
Hohenemser, K. H.; Crews, S. T.
1972-01-01
A two bladed 16-inch hingeless rotor model was built and tested outside and inside a 24 by 24 inch wind tunnel test section at collective pitch settings up to 5 deg and rotor advance ratios up to .4. The rotor model has a simple eccentric mechanism to provide progressing or regressing cyclic pitch excitation. The flapping responses were compared to analytically determined responses which included flap-bending elasticity but excluded rotor wake effects. Substantial systematic deviations of the measured responses from the computed responses were found, which were interpreted as the effects of interaction of the blades with a rotating asymmetrical wake.
A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection
Sabourin, Jeremy A; Valdar, William; Nobel, Andrew B
2015-01-01
Summary We describe a simple, computationally effcient, permutation-based procedure for selecting the penalty parameter in LASSO penalized regression. The procedure, permutation selection, is intended for applications where variable selection is the primary focus, and can be applied in a variety of structural settings, including that of generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of real biomedical data sets in which permutation selection is compared with selection based on the following: cross-validation (CV), the Bayesian information criterion (BIC), Scaled Sparse Linear Regression, and a selection method based on recently developed testing procedures for the LASSO. PMID:26243050
Non-ignorable missingness in logistic regression.
Wang, Joanna J J; Bartlett, Mark; Ryan, Louise
2017-08-30
Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Statistical Approaches for Spatiotemporal Prediction of Low Flows
NASA Astrophysics Data System (ADS)
Fangmann, A.; Haberlandt, U.
2017-12-01
An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.
Quantum State Tomography via Linear Regression Estimation
Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan
2013-01-01
A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
A Data-driven Approach for Forecasting Next-day River Discharge
NASA Astrophysics Data System (ADS)
Sharif, H. O.; Billah, K. S.
2017-12-01
This study focuses on evaluating the performance of the Soil and Water Assessment Tool (SWAT) eco-hydrological model, a simple Auto-Regressive with eXogenous input (ARX) model, and a Gene expression programming (GEP)-based model in one-day-ahead forecasting of discharge of a subtropical basin (the upper Kentucky River Basin). The three models were calibrated with daily flow at the US Geological Survey (USGS) stream gauging station not affected by flow regulation for the period of 2002-2005. The calibrated models were then validated at the same gauging station as well as another USGS gauge 88 km downstream for the period of 2008-2010. The results suggest that simple models outperform a sophisticated hydrological model with GEP having the advantage of being able to generate functional relationships that allow scientific investigation of the complex nonlinear interrelationships among input variables. Unlike SWAT, GEP, and to some extent, ARX are less sensitive to the length of the calibration time series and do not require a spin-up period.
Bayesian function-on-function regression for multilevel functional data.
Meyer, Mark J; Coull, Brent A; Versace, Francesco; Cinciripini, Paul; Morris, Jeffrey S
2015-09-01
Medical and public health research increasingly involves the collection of complex and high dimensional data. In particular, functional data-where the unit of observation is a curve or set of curves that are finely sampled over a grid-is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images. © 2015, The International Biometric Society.
Hybrid Rocket Performance Prediction with Coupling Method of CFD and Thermal Conduction Calculation
NASA Astrophysics Data System (ADS)
Funami, Yuki; Shimada, Toru
The final purpose of this study is to develop a design tool for hybrid rocket engines. This tool is a computer code which will be used in order to investigate rocket performance characteristics and unsteady phenomena lasting through the burning time, such as fuel regression or combustion oscillation. When phenomena inside a combustion chamber, namely boundary layer combustion, are described, it is difficult to use rigorous models for this target. It is because calculation cost may be too expensive. Therefore simple models are required for this calculation. In this study, quasi-one-dimensional compressible Euler equations for flowfields inside a chamber and the equation for thermal conduction inside a solid fuel are numerically solved. The energy balance equation at the solid fuel surface is solved to estimate fuel regression rate. Heat feedback model is Karabeyoglu's model dependent on total mass flux. Combustion model is global single step reaction model for 4 chemical species or chemical equilibrium model for 9 chemical species. As a first step, steady-state solutions are reported.
Gifford, Katherine A; Phillips, Jeffrey S; Samuels, Lauren R; Lane, Elizabeth M; Bell, Susan P; Liu, Dandan; Hohman, Timothy J; Romano, Raymond R; Fritzsche, Laura R; Lu, Zengqi; Jefferson, Angela L
2015-07-01
A symptom of mild cognitive impairment (MCI) and Alzheimer's disease (AD) is a flat learning profile. Learning slope calculation methods vary, and the optimal method for capturing neuroanatomical changes associated with MCI and early AD pathology is unclear. This study cross-sectionally compared four different learning slope measures from the Rey Auditory Verbal Learning Test (simple slope, regression-based slope, two-slope method, peak slope) to structural neuroimaging markers of early AD neurodegeneration (hippocampal volume, cortical thickness in parahippocampal gyrus, precuneus, and lateral prefrontal cortex) across the cognitive aging spectrum [normal control (NC); (n=198; age=76±5), MCI (n=370; age=75±7), and AD (n=171; age=76±7)] in ADNI. Within diagnostic group, general linear models related slope methods individually to neuroimaging variables, adjusting for age, sex, education, and APOE4 status. Among MCI, better learning performance on simple slope, regression-based slope, and late slope (Trial 2-5) from the two-slope method related to larger parahippocampal thickness (all p-values<.01) and hippocampal volume (p<.01). Better regression-based slope (p<.01) and late slope (p<.01) were related to larger ventrolateral prefrontal cortex in MCI. No significant associations emerged between any slope and neuroimaging variables for NC (p-values ≥.05) or AD (p-values ≥.02). Better learning performances related to larger medial temporal lobe (i.e., hippocampal volume, parahippocampal gyrus thickness) and ventrolateral prefrontal cortex in MCI only. Regression-based and late slope were most highly correlated with neuroimaging markers and explained more variance above and beyond other common memory indices, such as total learning. Simple slope may offer an acceptable alternative given its ease of calculation.
NASA Technical Reports Server (NTRS)
Wu, Man Li C.; Schubert, Siegfried; Lin, Ching I.; Stajner, Ivanka; Einaudi, Franco (Technical Monitor)
2000-01-01
A method is developed for validating model-based estimates of atmospheric moisture and ground temperature using satellite data. The approach relates errors in estimates of clear-sky longwave fluxes at the top of the Earth-atmosphere system to errors in geophysical parameters. The fluxes include clear-sky outgoing longwave radiation (CLR) and radiative flux in the window region between 8 and 12 microns (RadWn). The approach capitalizes on the availability of satellite estimates of CLR and RadWn and other auxiliary satellite data, and multiple global four-dimensional data assimilation (4-DDA) products. The basic methodology employs off-line forward radiative transfer calculations to generate synthetic clear-sky longwave fluxes from two different 4-DDA data sets. Simple linear regression is used to relate the clear-sky longwave flux discrepancies to discrepancies in ground temperature ((delta)T(sub g)) and broad-layer integrated atmospheric precipitable water ((delta)pw). The slopes of the regression lines define sensitivity parameters which can be exploited to help interpret mismatches between satellite observations and model-based estimates of clear-sky longwave fluxes. For illustration we analyze the discrepancies in the clear-sky longwave fluxes between an early implementation of the Goddard Earth Observing System Data Assimilation System (GEOS2) and a recent operational version of the European Centre for Medium-Range Weather Forecasts data assimilation system. The analysis of the synthetic clear-sky flux data shows that simple linear regression employing (delta)T(sub g)) and broad layer (delta)pw provides a good approximation to the full radiative transfer calculations, typically explaining more thin 90% of the 6 hourly variance in the flux differences. These simple regression relations can be inverted to "retrieve" the errors in the geophysical parameters, Uncertainties (normalized by standard deviation) in the monthly mean retrieved parameters range from 7% for (delta)T(sub g) to approx. 20% for the lower tropospheric moisture between 500 hPa and surface. The regression relationships developed from the synthetic flux data, together with CLR and RadWn observed with the Clouds and Earth Radiant Energy System instrument, ire used to assess the quality of the GEOS2 T(sub g) and pw. Results showed that the GEOS2 T(sub g) is too cold over land, and pw in upper layers is too high over the tropical oceans and too low in the lower atmosphere.
Sowande, O S; Oyewale, B F; Iyasere, O S
2010-06-01
The relationships between live weight and eight body measurements of West African Dwarf (WAD) goats were studied using 211 animals under farm condition. The animals were categorized based on age and sex. Data obtained on height at withers (HW), heart girth (HG), body length (BL), head length (HL), and length of hindquarter (LHQ) were fitted into simple linear, allometric, and multiple-regression models to predict live weight from the body measurements according to age group and sex. Results showed that live weight, HG, BL, LHQ, HL, and HW increased with the age of the animals. In multiple-regression model, HG and HL best fit the model for goat kids; HG, HW, and HL for goat aged 13-24 months; while HG, LHQ, HW, and HL best fit the model for goats aged 25-36 months. Coefficients of determination (R(2)) values for linear and allometric models for predicting the live weight of WAD goat increased with age in all the body measurements, with HG being the most satisfactory single measurement in predicting the live weight of WAD goat. Sex had significant influence on the model with R(2) values consistently higher in females except the models for LHQ and HW.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Potential pitfalls when denoising resting state fMRI data using nuisance regression.
Bright, Molly G; Tench, Christopher R; Murphy, Kevin
2017-07-01
In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Spatial interpolation schemes of daily precipitation for hydrologic modeling
Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.
2012-01-01
Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
ERIC Educational Resources Information Center
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Information-Decay Pursuit of Dynamic Parameters in Student Models
1994-04-01
simple worked-through example). Commercially available computer programs for structuring and using Bayesian inference include ERGO ( Noetic Systems...Tukey, J.W. (1977). Data analysis and Regression: A second course in statistics. Reading, MA: Addison-Wesley. Noetic Systems, Inc. (1991). ERGO...Naval Academy Division of Educational Studies Annapolis MD 21402-5002 Elmory Univerity Dr Janice Gifford 210 Fiabburne Bldg University of
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.
Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz
2018-01-01
There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
New methodology for modeling annual-aircraft emissions at airports
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodmansey, B.G.; Patterson, J.G.
An as-accurate-as-possible estimation of total-aircraft emissions are an essential component of any environmental-impact assessment done for proposed expansions at major airports. To determine the amount of emissions generated by aircraft using present models it is necessary to know the emission characteristics of all engines that are on all planes using the airport. However, the published data base does not cover all engine types and, therefore, a new methodology is needed to assist in estimating annual emissions from aircraft at airports. Linear regression equations relating quantity of emissions to aircraft weight using a known-fleet mix are developed in this paper. Total-annualmore » emissions for CO, NO[sub x], NMHC, SO[sub x], CO[sub 2], and N[sub 2]O are tabulated for Toronto's international airport for 1990. The regression equations are statistically significant for all emissions except for NMHC from large jets and NO[sub x] and NMHC for piston-engine aircraft. This regression model is a relatively simple, fast, and inexpensive method of obtaining an annual-emission inventory for an airport.« less
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Regression Analysis by Example. 5th Edition
ERIC Educational Resources Information Center
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Crawford, John R; Garthwaite, Paul H; Denham, Annie K; Chelune, Gordon J
2012-12-01
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because (a) not all psychologists are aware that regression equations can be built not only from raw data but also using only basic summary data for a sample, and (b) the computations involved are tedious and prone to error. In an attempt to overcome these barriers, Crawford and Garthwaite (2007) provided methods to build and apply simple linear regression models using summary statistics as data. In the present study, we extend this work to set out the steps required to build multiple regression models from sample summary statistics and the further steps required to compute the associated statistics for drawing inferences concerning an individual case. We also develop, describe, and make available a computer program that implements these methods. Although there are caveats associated with the use of the methods, these need to be balanced against pragmatic considerations and against the alternative of either entirely ignoring a pertinent data set or using it informally to provide a clinical "guesstimate." Upgraded versions of earlier programs for regression in the single case are also provided; these add the point and interval estimates of effect size developed in the present article.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...
2016-12-15
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Structured functional additive regression in reproducing kernel Hilbert spaces.
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2014-06-01
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
Hoch, Jeffrey S; Dewa, Carolyn S
2014-04-01
Economic evaluations commonly accompany trials of new treatments or interventions; however, regression methods and their corresponding advantages for the analysis of cost-effectiveness data are not well known. To illustrate regression-based economic evaluation, we present a case study investigating the cost-effectiveness of a collaborative mental health care program for people receiving short-term disability benefits for psychiatric disorders. We implement net benefit regression to illustrate its strengths and limitations. Net benefit regression offers a simple option for cost-effectiveness analyses of person-level data. By placing economic evaluation in a regression framework, regression-based techniques can facilitate the analysis and provide simple solutions to commonly encountered challenges. Economic evaluations of person-level data (eg, from a clinical trial) should use net benefit regression to facilitate analysis and enhance results.
Hyperopic photorefractive keratectomy and central islands
NASA Astrophysics Data System (ADS)
Gobbi, Pier Giorgio; Carones, Francesco; Morico, Alessandro; Vigo, Luca; Brancato, Rosario
1998-06-01
We have evaluated the refractive evolution in patients treated with yhyperopic PRK to assess the extent of the initial overcorrection and the time constant of regression. To this end, the time history of the refractive error (i.e. the difference between achieved and intended refractive correction) has been fitted by means of an exponential statistical model, giving information characterizing the surgical procedure with a direct clinical meaning. Both hyperopic and myopic PRk procedures have been analyzed by this method. The analysis of the fitting model parameters shows that hyperopic PRK patients exhibit a definitely higher initial overcorrection than myopic ones, and a regression time constant which is much longer. A common mechanism is proposed to be responsible for the refractive outcomes in hyperopic treatments and in myopic patients exhibiting significant central islands. The interpretation is in terms of superhydration of the central cornea, and is based on a simple physical model evaluating the amount of centripetal compression in the apical cornea.
Regression analysis of sparse asynchronous longitudinal data.
Cao, Hongyuan; Zeng, Donglin; Fine, Jason P
2015-09-01
We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.
Practical Session: Multiple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).
NASA Astrophysics Data System (ADS)
Camera, Corrado; Bruggeman, Adriana; Hadjinicolaou, Panos; Pashiardis, Stelios; Lange, Manfred A.
2014-01-01
High-resolution gridded daily data sets are essential for natural resource management and the analyses of climate changes and their effects. This study aims to evaluate the performance of 15 simple or complex interpolation techniques in reproducing daily precipitation at a resolution of 1 km2 over topographically complex areas. Methods are tested considering two different sets of observation densities and different rainfall amounts. We used rainfall data that were recorded at 74 and 145 observational stations, respectively, spread over the 5760 km2 of the Republic of Cyprus, in the Eastern Mediterranean. Regression analyses utilizing geographical copredictors and neighboring interpolation techniques were evaluated both in isolation and combined. Linear multiple regression (LMR) and geographically weighted regression methods (GWR) were tested. These included a step-wise selection of covariables, as well as inverse distance weighting (IDW), kriging, and 3D-thin plate splines (TPS). The relative rank of the different techniques changes with different station density and rainfall amounts. Our results indicate that TPS performs well for low station density and large-scale events and also when coupled with regression models. It performs poorly for high station density. The opposite is observed when using IDW. Simple IDW performs best for local events, while a combination of step-wise GWR and IDW proves to be the best method for large-scale events and high station density. This study indicates that the use of step-wise regression with a variable set of geographic parameters can improve the interpolation of large-scale events because it facilitates the representation of local climate dynamics.
Predicting lettuce canopy photosynthesis with statistical and neural network models
NASA Technical Reports Server (NTRS)
Frick, J.; Precetti, C.; Mitchell, C. A.
1998-01-01
An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).
Using Time-Series Regression to Predict Academic Library Circulations.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1984-01-01
Four methods were used to forecast monthly circulation totals in 15 midwestern academic libraries: dummy time-series regression, lagged time-series regression, simple average (straight-line forecasting), monthly average (naive forecasting). In tests of forecasting accuracy, dummy regression method and monthly mean method exhibited smallest average…
Eric J. Gustafson; Volker C. Radeloff; Robert Potts
2005-01-01
Natural resource amenities may be an attractor as people decide where they will live and invest in property. In the American Midwest these amenities range from lakes to forests to pastoral landscapes, depending on the ecological province. We used simple linear regression models to test the hypotheses that physiographic, land cover (composition and spatial pattern),...
Survival analysis: Part I — analysis of time-to-event
2018-01-01
Length of time is a variable often encountered during data analysis. Survival analysis provides simple, intuitive results concerning time-to-event for events of interest, which are not confined to death. This review introduces methods of analyzing time-to-event. The Kaplan-Meier survival analysis, log-rank test, and Cox proportional hazards regression modeling method are described with examples of hypothetical data. PMID:29768911
No evidence of reaction time slowing in autism spectrum disorder.
Ferraro, F Richard
2016-01-01
A total of 32 studies comprising 238 simple reaction time and choice reaction time conditions were examined in individuals with autism spectrum disorder (n = 964) and controls (n = 1032). A Brinley plot/multiple regression analysis was performed on mean reaction times, regressing autism spectrum disorder performance onto the control performance as a way to examine any generalized simple reaction time/choice reaction time slowing exhibited by the autism spectrum disorder group. The resulting regression equation was Y (autism spectrum disorder) = 0.99 × (control) + 87.93, which accounted for 92.3% of the variance. These results suggest that there are little if any simple reaction time/choice reaction time slowing in this sample of individual with autism spectrum disorder, in comparison with controls. While many cognitive and information processing domains are compromised in autism spectrum disorder, it appears that simple reaction time/choice reaction time remain relatively unaffected in autism spectrum disorder. © The Author(s) 2014.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
Fei, Y; Hu, J; Li, W-Q; Wang, W; Zong, G-Q
2017-03-01
Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10 -20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability. © 2016 International Society on Thrombosis and Haemostasis.
Lavesson, Tony; Amer-Wåhlin, Isis; Hansson, Stefan; Ley, David; Marsál, Karel; Olofsson, Per
2010-06-01
To evaluate a new technical equipment for continuous recording of human fetal scalp temperature in labor. Experimental animal study. Two temperature sensors were placed subcutaneously and intracranially on the forehead of 10 fetal lambs and connected to a temperature monitoring system. The system records temperatures simultaneously on-line and stores data to be analyzed off-line. Throughout the experiment, the fetus was oxygenated via the umbilical cord circulation. Asphyxia was induced by intermittent cord compression, as assessed by pH in jugular vein blood. The intracranial (ICT) and subcutaneous (SCT) temperatures were compared with simple and polynomial regression analyses. Absolute and delta ICT and SCT changes. ICT and SCT were both successfully recorded in all 10 cases. With increasing acidosis, the temperatures decreased. The correlation coefficient between ICT and SCT had a range of 0.76-0.97 (median 0.88) by simple linear regression and 0.80-0.99 (median 0.89) by second grade polynomial regression. After an initial system stabilization period of 10 minutes, the delta temperature values (ICT minus SCT) were less than 1.5 degrees C throughout the experiment in all but one case. The fetal forehead SCT mirrored the ICT closely, with the ICT being higher.
A Predictive Model for Readmissions Among Medicare Patients in a California Hospital.
Duncan, Ian; Huynh, Nhan
2017-11-17
Predictive models for hospital readmission rates are in high demand because of the Centers for Medicare & Medicaid Services (CMS) Hospital Readmission Reduction Program (HRRP). The LACE index is one of the most popular predictive tools among hospitals in the United States. The LACE index is a simple tool with 4 parameters: Length of stay, Acuity of admission, Comorbidity, and Emergency visits in the previous 6 months. The authors applied logistic regression to develop a predictive model for a medium-sized not-for-profit community hospital in California using patient-level data with more specific patient information (including 13 explanatory variables). Specifically, the logistic regression is applied to 2 populations: a general population including all patients and the specific group of patients targeted by the CMS penalty (characterized as ages 65 or older with select conditions). The 2 resulting logistic regression models have a higher sensitivity rate compared to the sensitivity of the LACE index. The C statistic values of the model applied to both populations demonstrate moderate levels of predictive power. The authors also build an economic model to demonstrate the potential financial impact of the use of the model for targeting high-risk patients in a sample hospital and demonstrate that, on balance, whether the hospital gains or loses from reducing readmissions depends on its margin and the extent of its readmission penalties.
A new statistical approach to climate change detection and attribution
NASA Astrophysics Data System (ADS)
Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe
2017-01-01
We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Linear Regression with a Randomly Censored Covariate: Application to an Alzheimer's Study.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2017-01-01
The association between maternal age of onset of dementia and amyloid deposition (measured by in vivo positron emission tomography (PET) imaging) in cognitively normal older offspring is of interest. In a regression model for amyloid, special methods are required due to the random right censoring of the covariate of maternal age of onset of dementia. Prior literature has proposed methods to address the problem of censoring due to assay limit of detection, but not random censoring. We propose imputation methods and a survival regression method that do not require parametric assumptions about the distribution of the censored covariate. Existing imputation methods address missing covariates, but not right censored covariates. In simulation studies, we compare these methods to the simple, but inefficient complete case analysis, and to thresholding approaches. We apply the methods to the Alzheimer's study.
NASA Astrophysics Data System (ADS)
Bhattarai, Nishan; Wagle, Pradeep; Gowda, Prasanna H.; Kakani, Vijaya G.
2017-11-01
The ability of remote sensing-based surface energy balance (SEB) models to track water stress in rain-fed switchgrass (Panicum virgatum L.) has not been explored yet. In this paper, the theoretical framework of crop water stress index (CWSI; 0 = extremely wet or no water stress condition and 1 = extremely dry or no transpiration) was utilized to estimate CWSI in rain-fed switchgrass using Landsat-derived evapotranspiration (ET) from five remote sensing based single-source SEB models, namely Surface Energy Balance Algorithm for Land (SEBAL), Mapping ET with Internalized Calibration (METRIC), Surface Energy Balance System (SEBS), Simplified Surface Energy Balance Index (S-SEBI), and Operational Simplified Surface Energy Balance (SSEBop). CWSI estimates from the five SEB models and a simple regression model that used normalized difference vegetation index (NDVI), near-surface temperature difference, and measured soil moisture (SM) as covariates were compared with those derived from eddy covariance measured ET (CWSIEC) for the 32 Landsat image acquisition dates during the 2011 (dry) and 2013 (wet) growing seasons. Results indicate that most SEB models can predict CWSI reasonably well. For example, the root mean square error (RMSE) ranged from 0.14 (SEBAL) to 0.29 (SSEBop) and the coefficient of determination (R2) ranged from 0.25 (SSEBop) to 0.72 (SEBAL), justifying the added complexity in CWSI modeling as compared to results from the simple regression model (R2 = 0.55, RMSE = 0.16). All SEB models underestimated CWSI in the dry year but the estimates from SEBAL and S-SEBI were within 7% of the mean CWSIEC and explained over 60% of variations in CWSIEC. In the wet year, S-SEBI mostly overestimated CWSI (around 28%), while estimates from METRIC, SEBAL, SEBS, and SSEBop were within 8% of the mean CWSIEC. Overall, SEBAL was the most robust model under all conditions followed by METRIC, whose performance was slightly worse and better than SEBAL in dry and wet years, respectively. Underestimation of CWSI under extremely dry soil conditions and the substantial role of SM in the regression model suggest that integration of SM in SEB models could improve their performances under dry conditions. These insights will provide useful guidance on the broader applicability of SEB models for mapping water stresses in switchgrass under varying geographical and meteorological conditions.
Structured functional additive regression in reproducing kernel Hilbert spaces
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2013-01-01
Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362
Connections between survey calibration estimators and semiparametric models for incomplete data
Lumley, Thomas; Shaw, Pamela A.; Dai, James Y.
2012-01-01
Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the ‘estimated weights’ paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models. PMID:23833390
Development of a Risk Prediction Model and Clinical Risk Score for Isolated Tricuspid Valve Surgery.
LaPar, Damien J; Likosky, Donald S; Zhang, Min; Theurer, Patty; Fonner, C Edwin; Kern, John A; Bolling, Stephen F; Drake, Daniel H; Speir, Alan M; Rich, Jeffrey B; Kron, Irving L; Prager, Richard L; Ailawadi, Gorav
2018-02-01
While tricuspid valve (TV) operations remain associated with high mortality (∼8-10%), no robust prediction models exist to support clinical decision-making. We developed a preoperative clinical risk model with an easily calculable clinical risk score (CRS) to predict mortality and major morbidity after isolated TV surgery. Multi-state Society of Thoracic Surgeons database records were evaluated for 2,050 isolated TV repair and replacement operations for any etiology performed at 50 hospitals (2002-2014). Parsimonious preoperative risk prediction models were developed using multi-level mixed effects regression to estimate mortality and composite major morbidity risk. Model results were utilized to establish a novel CRS for patients undergoing TV operations. Models were evaluated for discrimination and calibration. Operative mortality and composite major morbidity rates were 9% and 42%, respectively. Final regression models performed well (both P<0.001, AUC = 0.74 and 0.76) and included preoperative factors: age, gender, stroke, hemodialysis, ejection fraction, lung disease, NYHA class, reoperation and urgent or emergency status (all P<0.05). A simple CRS from 0-10+ was highly associated (P<0.001) with incremental increases in predicted mortality and major morbidity. Predicted mortality risk ranged from 2%-34% across CRS categories, while predicted major morbidity risk ranged from 13%-71%. Mortality and major morbidity after isolated TV surgery can be predicted using preoperative patient data from the STS Adult Cardiac Database. A simple clinical risk score predicts mortality and major morbidity after isolated TV surgery. This score may facilitate perioperative counseling and identification of suitable patients for TV surgery. Copyright © 2018 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Farsa, Oldřich
2013-01-01
The log BB parameter is the logarithm of the ratio of a compound's equilibrium concentrations in the brain tissue versus the blood plasma. This parameter is a useful descriptor in assessing the ability of a compound to permeate the blood-brain barrier. The aim of this study was to develop a Hansch-type linear regression QSAR model that correlates the parameter log BB and the retention time of drugs and other organic compounds on a reversed-phase HPLC containing an embedded amide moiety. The retention time was expressed by the capacity factor log k'. The second aim was to estimate the brain's absorption of 2-(azacycloalkyl)acetamidophenoxyacetic acids, which are analogues of piracetam, nefiracetam, and meclofenoxate. Notably, these acids may be novel nootropics. Two simple regression models that relate log BB and log k' were developed from an assay performed using a reversed-phase HPLC that contained an embedded amide moiety. Both the quadratic and linear models yielded statistical parameters comparable to previously published models of log BB dependence on various structural characteristics. The models predict that four members of the substituted phenoxyacetic acid series have a strong chance of permeating the barrier and being absorbed in the brain. The results of this study show that a reversed-phase HPLC system containing an embedded amide moiety is a functional in vitro surrogate of the blood-brain barrier. These results suggest that racetam-type nootropic drugs containing a carboxylic moiety could be more poorly absorbed than analogues devoid of the carboxyl group, especially if the compounds penetrate the barrier by a simple diffusion mechanism.
Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage
NASA Astrophysics Data System (ADS)
Cepowski, Tomasz
2017-06-01
The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
QSPR using MOLGEN-QSPR: the challenge of fluoroalkane boiling points.
Rücker, Christoph; Meringer, Markus; Kerber, Adalbert
2005-01-01
By means of the new software MOLGEN-QSPR, a multilinear regression model for the boiling points of lower fluoroalkanes is established. The model is based exclusively on simple descriptors derived directly from molecular structure and nevertheless describes a broader set of data more precisely than previous attempts that used either more demanding (quantum chemical) descriptors or more demanding (nonlinear) statistical methods such as neural networks. The model's internal consistency was confirmed by leave-one-out cross-validation. The model was used to predict all unknown boiling points of fluorobutanes, and the quality of predictions was estimated by means of comparison with boiling point predictions for fluoropentanes.
Linear models for calculating digestibile energy for sheep diets.
Fonnesbeck, P V; Christiansen, M L; Harris, L E
1981-05-01
Equations for estimating the digestible energy (DE) content of sheep diets were generated from the chemical contents and a factorial description of diets fed to lambs in digestion trials. The diet factors were two forages (alfalfa and grass hay), harvested at three stages of maturity (late vegetative, early bloom and full bloom), fed in two ingredient combinations (all hay or a 50:50 hay and corn grain mixture) and prepared by two forage texture processes (coarsely chopped or finely chopped and pelleted). The 2 x 3 x 2 x 2 factorial arrangement produced 24 diet treatments. These were replicated twice, for a total of 48 lamb digestion trials. In model 1 regression equations, DE was calculated directly from chemical composition of the diet. In model 2, regression equations predicted the percentage of digested nutrient from the chemical contents of the diet and then DE of the diet was calculated as the sum of the gross energy of the digested organic components. Expanded forms of model 1 and model 2 were also developed that included diet factors as qualitative indicator variables to adjust the regression constant and regression coefficients for the diet description. The expanded forms of the equations accounted for significantly more variation in DE than did the simple models and more accurately estimated DE of the diet. Information provided by the diet description proved as useful as chemical analyses for the prediction of digestibility of nutrients. The statistics indicate that, with model 1, neutral detergent fiber and plant cell wall analyses provided as much information for the estimation of DE as did model 2 with the combined information from crude protein, available carbohydrate, total lipid, cellulose and hemicellulose. Regression equations are presented for estimating DE with the most currently analyzed organic components, including linear and curvilinear variables and diet factors that significantly reduce the standard error of the estimate. To estimate De of a diet, the user utilizes the equation that uses the chemical analysis information and diet description most effectively.
A Study on the Potential Applications of Satellite Data in Air Quality Monitoring and Forecasting
NASA Technical Reports Server (NTRS)
Li, Can; Hsu, N. Christina; Tsay, Si-Chee
2011-01-01
In this study we explore the potential applications of MODIS (Moderate Resolution Imaging Spectroradiometer) -like satellite sensors in air quality research for some Asian regions. The MODIS aerosol optical thickness (AOT), NCEP global reanalysis meteorological data, and daily surface PM(sub 10) concentrations over China and Thailand from 2001 to 2009 were analyzed using simple and multiple regression models. The AOT-PM(sub 10) correlation demonstrates substantial seasonal and regional difference, likely reflecting variations in aerosol composition and atmospheric conditions, Meteorological factors, particularly relative humidity, were found to influence the AOT-PM(sub 10) relationship. Their inclusion in regression models leads to more accurate assessment of PM(sub 10) from space borne observations. We further introduced a simple method for employing the satellite data to empirically forecast surface particulate pollution, In general, AOT from the previous day (day 0) is used as a predicator variable, along with the forecasted meteorology for the following day (day 1), to predict the PM(sub 10) level for day 1. The contribution of regional transport is represented by backward trajectories combined with AOT. This method was evaluated through PM(sub 10) hindcasts for 2008-2009, using ohservations from 2005 to 2007 as a training data set to obtain model coefficients. For five big Chinese cities, over 50% of the hindcasts have percentage error less than or equal to 30%. Similar performance was achieved for cities in northern Thailand. The MODIS AOT data are responsible for at least part of the demonstrated forecasting skill. This method can be easily adapted for other regions, but is probably most useful for those having sparse ground monitoring networks or no access to sophisticated deterministic models. We also highlight several existing issues, including some inherent to a regression-based approach as exemplified by a case study for Beijing, Further studies will be necessa1Y before satellite data can see more extensive applications in the operational air quality monitoring and forecasting.
Estimating maize production in Kenya using NDVI: Some statistical considerations
Lewis, J.E.; Rowland, James; Nadeau , A.
1998-01-01
A regression model approach using a normalized difference vegetation index (NDVI) has the potential for estimating crop production in East Africa. However, before production estimation can become a reality, the underlying model assumptions and statistical nature of the sample data (NDVI and crop production) must be examined rigorously. Annual maize production statistics from 1982-90 for 36 agricultural districts within Kenya were used as the dependent variable; median area NDVI (independent variable) values from each agricultural district and year were extracted from the annual maximum NDVI data set. The input data and the statistical association of NDVI with maize production for Kenya were tested systematically for the following items: (1) homogeneity of the data when pooling the sample, (2) gross data errors and influence points, (3) serial (time) correlation, (4) spatial autocorrelation and (5) stability of the regression coefficients. The results of using a simple regression model with NDVI as the only independent variable are encouraging (r 0.75, p 0.05) and illustrate that NDVI can be a responsive indicator of maize production, especially in areas of high NDVI spatial variability, which coincide with areas of production variability in Kenya.
Stamey, Timothy C.
1998-01-01
Simple and reliable methods for estimating hourly streamflow are needed for the calibration and verification of a Chattahoochee River basin model between Buford Dam and Franklin, Ga. The river basin model is being developed by Georgia Department of Natural Resources, Environmental Protection Division, as part of their Chattahoochee River Modeling Project. Concurrent streamflow data collected at 19 continuous-record, and 31 partial-record streamflow stations, were used in ordinary least-squares linear regression analyses to define estimating equations, and in verifying drainage-area prorations. The resulting regression or drainage-area ratio estimating equations were used to compute hourly streamflow at the partial-record stations. The coefficients of determination (r-squared values) for the regression estimating equations ranged from 0.90 to 0.99. Observed and estimated hourly and daily streamflow data were computed for May 1, 1995, through October 31, 1995. Comparisons of observed and estimated daily streamflow data for 12 continuous-record tributary stations, that had available streamflow data for all or part of the period from May 1, 1995, to October 31, 1995, indicate that the mean error of estimate for the daily streamflow was about 25 percent.
NASA Astrophysics Data System (ADS)
Xin, Pei; Wang, Shen S. J.; Shen, Chengji; Zhang, Zeyu; Lu, Chunhui; Li, Ling
2018-03-01
Shallow groundwater interacts strongly with surface water across a quarter of global land area, affecting significantly the terrestrial eco-hydrology and biogeochemistry. We examined groundwater behavior subjected to unimodal impulse and irregular surface water fluctuations, combining physical experiments, numerical simulations, and functional data analysis. Both the experiments and numerical simulations demonstrated a damped and delayed response of groundwater table to surface water fluctuations. To quantify this hysteretic shallow groundwater behavior, we developed a regression model with the Gamma distribution functions adopted to account for the dependence of groundwater behavior on antecedent surface water conditions. The regression model fits and predicts well the groundwater table oscillations resulting from propagation of irregular surface water fluctuations in both laboratory and large-scale aquifers. The coefficients of the Gamma distribution function vary spatially, reflecting the hysteresis effect associated with increased amplitude damping and delay as the fluctuation propagates. The regression model, in a relatively simple functional form, has demonstrated its capacity of reproducing high-order nonlinear effects that underpin the surface water and groundwater interactions. The finding has important implications for understanding and predicting shallow groundwater behavior and associated biogeochemical processes, and will contribute broadly to studies of groundwater-dependent ecology and biogeochemistry.
Guo, Changning; Doub, William H; Kauffman, John F
2010-08-01
Monte Carlo simulations were applied to investigate the propagation of uncertainty in both input variables and response measurements on model prediction for nasal spray product performance design of experiment (DOE) models in the first part of this study, with an initial assumption that the models perfectly represent the relationship between input variables and the measured responses. In this article, we discard the initial assumption, and extended the Monte Carlo simulation study to examine the influence of both input variable variation and product performance measurement variation on the uncertainty in DOE model coefficients. The Monte Carlo simulations presented in this article illustrate the importance of careful error propagation during product performance modeling. Our results show that the error estimates based on Monte Carlo simulation result in smaller model coefficient standard deviations than those from regression methods. This suggests that the estimated standard deviations from regression may overestimate the uncertainties in the model coefficients. Monte Carlo simulations provide a simple software solution to understand the propagation of uncertainty in complex DOE models so that design space can be specified with statistically meaningful confidence levels. (c) 2010 Wiley-Liss, Inc. and the American Pharmacists Association
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Ground temperature measurement by PRT-5 for maps experiment
NASA Technical Reports Server (NTRS)
Gupta, S. K.; Tiwari, S. N.
1978-01-01
A simple algorithm and computer program were developed for determining the actual surface temperature from the effective brightness temperature as measured remotely by a radiation thermometer called PRT-5. This procedure allows the computation of atmospheric correction to the effective brightness temperature without performing detailed radiative transfer calculations. Model radiative transfer calculations were performed to compute atmospheric corrections for several values of the surface and atmospheric parameters individually and in combination. Polynomial regressions were performed between the magnitudes or deviations of these parameters and the corresponding computed corrections to establish simple analytical relations between them. Analytical relations were also developed to represent combined correction for simultaneous variation of parameters in terms of their individual corrections.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the multivariate adaptive regression splines (MARS) technique, conclusions of this research work are exposed.
Earnst, K S; Marson, D C; Harrell, L E
2000-08-01
To investigate measures of patient cognitive abilities as predictors of physician judgments of medical treatment consent capacity (competency) in patients with Alzheimer's disease (AD). Predictor models of legal standards (LS) and personal competency judgments were developed for each study physician using independent neuropsychological test measures and logistic regression analyses. A university medical center. Five physicians with experience assessing the competency of AD patients were recruited to make competency judgments of videotaped vignettes from 10 older controls and 21 patients with AD (10 with mild and 11 with moderate dementia). The 31 patient and control videotapes of performance on a measure of treatment consent capacity (Capacity to Consent to Treatment Instrument) (CCTI) were rated by the five physicians. The CCTI consists of two clinical vignettes (A-neoplasm and B-cardiac) that test competency under five LS. Each study physician viewed each vignette videotape individually, made judgments of competent or incompetent under each of the LS, and then made his/her own personal competency judgment. Physicians were blinded to participant diagnosis and neuropsychological test performance. Stepwise logistic regression was conducted to identify cognitive predictors of each physician's LS and personal competency judgments for Vignette A using the full sample (n = 31). Classification logistic regression analysis was used to determine how well these cognitive predictor models classified each physician's competency judgments for Vignette A. These classification models were then cross-validated using physician's Vignette B judgments. Cognitive predictor models for Vignette A competency judgments differed across individual physicians, and were related to difficulty of LS and to incompetency outcome rates across LS for AD patients. Measures of semantic knowledge and receptive language predicted judgments under less difficult LS of evidencing a treatment choice (LS1) and making the reasonable treatment choice (LS2). Measures of semantic knowledge, short-term verbal recall, and simple reasoning ability predicted judgments under more difficult and clinically relevant LS of appreciating consequences of a treatment choice (LS3), providing rational reasons for a treatment choice (LS4), and understanding the treatment situation and choices (LSS). Cognitive models for physicians' personal competency judgments were virtually identical to their respective models for LS5 judgments. For AD patients, shortterm memory predictors were associated with high incompetency outcome rates (over 70%), a simple reasoning measure was associated with moderately high incompetency outcome rates (60-70%), and a semantic knowledge measure was associated with lower incompetency outcome rates (30-60%). Overall, single predictor models were relatively robust, correctly classifying an average of 83% of physician judgments for Vignette A and 80% of judgments for Vignette B. Multiple cognitive functions predicted physicians' LS and personal competency judgments. Declines in semantic knowledge, short-term verbal recall, and simple reasoning ability predicted physicians' judgments on the three most difficult and clinically most relevant LS (LS3-LS5), as well as their personal competency judgments. Our findings suggest that clinical assessment of competency should include evaluation of semantic knowledge, verbal recall, and simple reasoning abilities.
Equivalent circuit models for interpreting impedance perturbation spectroscopy data
NASA Astrophysics Data System (ADS)
Smith, R. Lowell
2004-07-01
As in-situ structural integrity monitoring disciplines mature, there is a growing need to process sensor/actuator data efficiently in real time. Although smaller, faster embedded processors will contribute to this, it is also important to develop straightforward, robust methods to reduce the overall computational burden for practical applications of interest. This paper addresses the use of equivalent circuit modeling techniques for inferring structure attributes monitored using impedance perturbation spectroscopy. In pioneering work about ten years ago significant progress was associated with the development of simple impedance models derived from the piezoelectric equations. Using mathematical modeling tools currently available from research in ultrasonics and impedance spectroscopy is expected to provide additional synergistic benefits. For purposes of structural health monitoring the objective is to use impedance spectroscopy data to infer the physical condition of structures to which small piezoelectric actuators are bonded. Features of interest include stiffness changes, mass loading, and damping or mechanical losses. Equivalent circuit models are typically simple enough to facilitate the development of practical analytical models of the actuator-structure interaction. This type of parametric structure model allows raw impedance/admittance data to be interpreted optimally using standard multiple, nonlinear regression analysis. One potential long-term outcome is the possibility of cataloging measured viscoelastic properties of the mechanical subsystems of interest as simple lists of attributes and their statistical uncertainties, whose evolution can be followed in time. Equivalent circuit models are well suited for addressing calibration and self-consistency issues such as temperature corrections, Poisson mode coupling, and distributed relaxation processes.
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.
A case-mix classification system for explaining healthcare costs using administrative data in Italy.
Corti, Maria Chiara; Avossa, Francesco; Schievano, Elena; Gallina, Pietro; Ferroni, Eliana; Alba, Natalia; Dotto, Matilde; Basso, Cristina; Netti, Silvia Tiozzo; Fedeli, Ugo; Mantoan, Domenico
2018-03-04
The Italian National Health Service (NHS) provides universal coverage to all citizens, granting primary and hospital care with a copayment system for outpatient and drug services. Financing of Local Health Trusts (LHTs) is based on a capitation system adjusted only for age, gender and area of residence. We applied a risk-adjustment system (Johns Hopkins Adjusted Clinical Groups System, ACG® System) in order to explain health care costs using routinely collected administrative data in the Veneto Region (North-eastern Italy). All residents in the Veneto Region were included in the study. The ACG system was applied to classify the regional population based on the following information sources for the year 2015: Hospital Discharges, Emergency Room visits, Chronic disease registry for copayment exemptions, ambulatory visits, medications, the Home care database, and drug prescriptions. Simple linear regressions were used to contrast an age-gender model to models incorporating more comprehensive risk measures aimed at predicting health care costs. A simple age-gender model explained only 8% of the variance of 2015 total costs. Adding diagnoses-related variables provided a 23% increase, while pharmacy based variables provided an additional 17% increase in explained variance. The adjusted R-squared of the comprehensive model was 6 times that of the simple age-gender model. ACG System provides substantial improvement in predicting health care costs when compared to simple age-gender adjustments. Aging itself is not the main determinant of the increase of health care costs, which is better explained by the accumulation of chronic conditions and the resulting multimorbidity. Copyright © 2018. Published by Elsevier B.V.
Dinç, Erdal; Ozdemir, Abdil
2005-01-01
Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.
Unsteady hovering wake parameters identified from dynamic model tests, part 1
NASA Technical Reports Server (NTRS)
Hohenemser, K. H.; Crews, S. T.
1977-01-01
The development of a 4-bladed model rotor is reported that can be excited with a simple eccentric mechanism in progressing and regressing modes with either harmonic or transient inputs. Parameter identification methods were applied to the problem of extracting parameters for linear perturbation models, including rotor dynamic inflow effects, from the measured blade flapping responses to transient pitch stirring excitations. These perturbation models were then used to predict blade flapping response to other pitch stirring transient inputs, and rotor wake and blade flapping responses to harmonic inputs. The viability and utility of using parameter identification methods for extracting the perturbation models from transients are demonstrated through these combined analytical and experimental studies.
Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier
2018-02-01
Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Novitski, Linda Nicole
Accurate and cost-effective assessment of water quality is necessary for proper management and restoration of inland water bodies susceptible to algal bloom conditions. Landsat and MODIS satellite images were used to create chlorophyll and Secchi depth predictive models for algal assessment of Great Lakes and other lakes of the United States. Boosted regression tree (BRT) models using satellite imagery are both easy to use and can have high predictive performance. BRT models inferred chlorophyll and Secchi depth more accurately than linear regression models for all study locations. Inferred chlorophyll of inner Saginaw Bay was subsequently used in ecological models to help understand the ecological drivers of algal blooms in this ecosystem. For small lakes (non-Great Lakes), the best national Landsat model for ln-transformed chlorophyll was the BRT model and had a cross-validation R 2 of 0.44 and a 0.76 ln-transformed mug/L RMSE. The best national Landsat model for Secchi depth was also a BRT model that had an adjusted R 2 of 0.52 and a 0.80 m RMSE. We assessed the applicability of the national chlorophyll model for ecological analysis by comparing the total phosphorus- chlorophyll relationship with chlorophyll determined from sampling or remote sensing, which showed the total phosphorus- chlorophyll relationship had an adjusted R2 = 0.58 and 1.02 ln-transformed microg/L RMSE with sampled chlorophyll versus an adjusted R2 = 0.56 and 1.04 ln-transformed mug/L RMSE with chlorophyll determined by the boosted regression tree remote sensing model. For Great Lakes models, the MODIS BRT model predicted chlorophyll most accurately of the three BRT models and compared well to other models in the literature. BRT models for Landsat ETM+ and TM more accurately predicted chlorophyll than the MSS model and all Landsat models had favorable results when compared to the literature. BRT chlorophyll predictive models are useful in helping to understand historical, long-term chlorophyll trends and to inform us of how climate change may alter ecosystems in the future. In inner Saginaw Bay, annual average and upper quartile Landsat-derived chlorophyll decreased from 7.44 to 6.62 and 8.38 to 7.38 mug/L between 1973-1982, and annual upper quartile of 8-day phosphorus loads increased from 5.29 to 6.79 kg between 1973-2012. Simple linear and multiple regression models and Wilcoxon rank test results for MODIS and Landsat-derived chlorophyll indicate that distance from the Saginaw River mouth influences chlorophyll concentration in Saginaw Bay; Landsat-derived surface water temperature and phosphorus loads to a lesser extent. Mixed-effect models for MODIS and Landsat-derived chlorophyll were related to chlorophyll better than simple linear or multiple regressions, with random effects of pixel and sample date contributing substantially to predictive power (NSE=0.35-70), though phosphorus loads, distance to Saginaw River mouth, and water were significant fixed effects in most models. Water quality changes in Saginaw Bay between 1972-2012 were influenced by phosphorus loading and distance to the Saginaw River's mouth. Landsat and MODIS imagery are complementary platforms because of the long history of Landsat operation and the finer spectral resolution and image frequency of MODIS. Remote sensing water quality assessment tools can be valuable for limnological study, ecological assessment, and water resource management.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math.
Raizada, Rajeev D S; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D; Ansari, Daniel; Kuhl, Patricia K
2010-05-15
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain-behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain-behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain-behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Linking brain-wide multivoxel activation patterns to behaviour: Examples from language and math
Raizada, Rajeev D.S.; Tsao, Feng-Ming; Liu, Huei-Mei; Holloway, Ian D.; Ansari, Daniel; Kuhl, Patricia K.
2010-01-01
A key goal of cognitive neuroscience is to find simple and direct connections between brain and behaviour. However, fMRI analysis typically involves choices between many possible options, with each choice potentially biasing any brain–behaviour correlations that emerge. Standard methods of fMRI analysis assess each voxel individually, but then face the problem of selection bias when combining those voxels into a region-of-interest, or ROI. Multivariate pattern-based fMRI analysis methods use classifiers to analyse multiple voxels together, but can also introduce selection bias via data-reduction steps as feature selection of voxels, pre-selecting activated regions, or principal components analysis. We show here that strong brain–behaviour links can be revealed without any voxel selection or data reduction, using just plain linear regression as a classifier applied to the whole brain at once, i.e. treating each entire brain volume as a single multi-voxel pattern. The brain–behaviour correlations emerged despite the fact that the classifier was not provided with any information at all about subjects' behaviour, but instead was given only the neural data and its condition-labels. Surprisingly, more powerful classifiers such as a linear SVM and regularised logistic regression produce very similar results. We discuss some possible reasons why the very simple brain-wide linear regression model is able to find correlations with behaviour that are as strong as those obtained on the one hand from a specific ROI and on the other hand from more complex classifiers. In a manner which is unencumbered by arbitrary choices, our approach offers a method for investigating connections between brain and behaviour which is simple, rigorous and direct. PMID:20132896
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.
Evaluating and Improving the SAMA (Segmentation Analysis and Market Assessment) Recruiting Model
2015-06-01
and rewarding me with your love every day. xx THIS PAGE INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. THE UNITED STATES ARMY RECRUITING...the relationship between the calculated SAMA potential and the actual 2014 performance. The scatterplot in Figure 8 shows a strong linear... relationship between the SAMA calculated potential and the contracting achievement for 2014, with an R-squared value of 0.871. Simple Linear Regression of
Regression analysis of sparse asynchronous longitudinal data
Cao, Hongyuan; Zeng, Donglin; Fine, Jason P.
2015-01-01
Summary We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus. PMID:26568699
NASA Astrophysics Data System (ADS)
Anderson, R. B.; Clegg, S. M.; Frydenvang, J.
2015-12-01
One of the primary challenges faced by the ChemCam instrument on the Curiosity Mars rover is developing a regression model that can accurately predict the composition of the wide range of target types encountered (basalts, calcium sulfate, feldspar, oxides, etc.). The original calibration used 69 rock standards to train a partial least squares (PLS) model for each major element. By expanding the suite of calibration samples to >400 targets spanning a wider range of compositions, the accuracy of the model was improved, but some targets with "extreme" compositions (e.g. pure minerals) were still poorly predicted. We have therefore developed a simple method, referred to as "submodel PLS", to improve the performance of PLS across a wide range of target compositions. In addition to generating a "full" (0-100 wt.%) PLS model for the element of interest, we also generate several overlapping submodels (e.g. for SiO2, we generate "low" (0-50 wt.%), "mid" (30-70 wt.%), and "high" (60-100 wt.%) models). The submodels are generally more accurate than the "full" model for samples within their range because they are able to adjust for matrix effects that are specific to that range. To predict the composition of an unknown target, we first predict the composition with the submodels and the "full" model. Then, based on the predicted composition from the "full" model, the appropriate submodel prediction can be used (e.g. if the full model predicts a low composition, use the "low" model result, which is likely to be more accurate). For samples with "full" predictions that occur in a region of overlap between submodels, the submodel predictions are "blended" using a simple linear weighted sum. The submodel PLS method shows improvements in most of the major elements predicted by ChemCam and reduces the occurrence of negative predictions for low wt.% targets. Submodel PLS is currently being used in conjunction with ICA regression for the major element compositions of ChemCam data.
Overton, Edgar Turner; Kauwe, John S K; Paul, Robert; Tashima, Karen; Tate, David F; Patel, Pragna; Carpenter, Charles C J; Patty, David; Brooks, John T; Clifford, David B
2011-11-01
HIV-associated neurocognitive disorders remain prevalent but challenging to diagnose particularly among non-demented individuals. To determine whether a brief computerized battery correlates with formal neurocognitive testing, we identified 46 HIV-infected persons who had undergone both formal neurocognitive testing and a brief computerized battery. Simple detection tests correlated best with formal neuropsychological testing. By multivariable regression model, 53% of the variance in the composite Global Deficit Score was accounted for by elements from the brief computerized tool (P < 0.01). These data confirm previous correlation data with the computerized battery. Using the five significant parameters from the regression model in a Receiver Operating Characteristic curve, 90% of persons were accurately classified as being cognitively impaired or not. The test battery requires additional evaluation, specifically for identifying persons with mild impairment, a state upon which interventions may be effective.
Simple agrometeorological models for estimating Guineagrass yield in Southeast Brazil.
Pezzopane, José Ricardo Macedo; da Cruz, Pedro Gomes; Santos, Patricia Menezes; Bosi, Cristiam; de Araujo, Leandro Coelho
2014-09-01
The objective of this work was to develop and evaluate agrometeorological models to simulate the production of Guineagrass. For this purpose, we used forage yield from 54 growing periods between December 2004-January 2007 and April 2010-March 2012 in irrigated and non-irrigated pastures in São Carlos, São Paulo state, Brazil (latitude 21°57'42″ S, longitude 47°50'28″ W and altitude 860 m). Initially we performed linear regressions between the agrometeorological variables and the average dry matter accumulation rate for irrigated conditions. Then we determined the effect of soil water availability on the relative forage yield considering irrigated and non-irrigated pastures, by means of segmented linear regression among water balance and relative production variables (dry matter accumulation rates with and without irrigation). The models generated were evaluated with independent data related to 21 growing periods without irrigation in the same location, from eight growing periods in 2000 and 13 growing periods between December 2004-January 2007 and April 2010-March 2012. The results obtained show the satisfactory predictive capacity of the agrometeorological models under irrigated conditions based on univariate regression (mean temperature, minimum temperature and potential evapotranspiration or degreedays) or multivariate regression. The response of irrigation on production was well correlated with the climatological water balance variables (ratio between actual and potential evapotranspiration or between actual and maximum soil water storage). The models that performed best for estimating Guineagrass yield without irrigation were based on minimum temperature corrected by relative soil water storage, determined by the ratio between the actual soil water storage and the soil water holding capacity.irrigation in the same location, in 2000, 2010 and 2011. The results obtained show the satisfactory predictive capacity of the agrometeorological models under irrigated conditions based on univariate regression (mean temperature, potential evapotranspiration or degree-days) or multivariate regression. The response of irrigation on production was well correlated with the climatological water balance variables (ratio between actual and potential evapotranspiration or between actual and maximum soil water storage). The models that performed best for estimating Guineagrass yield without irrigation were based on degree-days corrected by the water deficit factor.
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Manh, Cuong Do
2015-01-01
The Mekong Delta is highly vulnerable to climate change and a dengue endemic area in Vietnam. This study aims to examine the association between climate factors and dengue incidence and to identify the best climate prediction model for dengue incidence in Can Tho city, the Mekong Delta area in Vietnam. We used three different regression models comprising: standard multiple regression model (SMR), seasonal autoregressive integrated moving average model (SARIMA), and Poisson distributed lag model (PDLM) to examine the association between climate factors and dengue incidence over the period 2003-2010. We validated the models by forecasting dengue cases for the period of January-December, 2011 using the mean absolute percentage error (MAPE). Receiver operating characteristics curves were used to analyze the sensitivity of the forecast of a dengue outbreak. The results indicate that temperature and relative humidity are significantly associated with changes in dengue incidence consistently across the model methods used, but not cumulative rainfall. The Poisson distributed lag model (PDLM) performs the best prediction of dengue incidence for a 6, 9, and 12-month period and diagnosis of an outbreak however the SARIMA model performs a better prediction of dengue incidence for a 3-month period. The simple or standard multiple regression performed highly imprecise prediction of dengue incidence. We recommend a follow-up study to validate the model on a larger scale in the Mekong Delta region and to analyze the possibility of incorporating a climate-based dengue early warning method into the national dengue surveillance system. Copyright © 2014 Elsevier B.V. All rights reserved.
Goldie, James; Alexander, Lisa; Lewis, Sophie C; Sherwood, Steven
2017-08-01
To find appropriate regression model specifications for counts of the daily hospital admissions of a Sydney cohort and determine which human heat stress indices best improve the models' fit. We built parent models of eight daily counts of admission records using weather station observations, census population estimates and public holiday data. We added heat stress indices; models with lower Akaike Information Criterion scores were judged a better fit. Five of the eight parent models demonstrated adequate fit. Daily maximum Simplified Wet Bulb Globe Temperature (sWBGT) consistently improved fit more than most other indices; temperature and heatwave indices also modelled some health outcomes well. Humidity and heat-humidity indices better fit counts of patients who died following admission. Maximum sWBGT is an ideal measure of heat stress for these types of Sydney hospital admissions. Simple temperature indices are a good fallback where a narrower range of conditions is investigated. Implications for public health: This study confirms the importance of selecting appropriate heat stress indices for modelling. Epidemiologists projecting Sydney hospital admissions should use maximum sWBGT as a common measure of heat stress. Health organisations interested in short-range forecasting may prefer simple temperature indices. © 2017 The Authors.
On the impact of relatedness on SNP association analysis.
Gross, Arnd; Tönjes, Anke; Scholz, Markus
2017-12-06
When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation. Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
Predicting heavy metal concentrations in soils and plants using field spectrophotometry
NASA Astrophysics Data System (ADS)
Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.
2017-09-01
Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p <0.01) in a 2498 nm spectral range for soils. Pb has a significantly higher correlation at red edge for plants. A regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
A simple approach to lifetime learning in genetic programming-based symbolic regression.
Azad, Raja Muhammad Atif; Ryan, Conor
2014-01-01
Genetic programming (GP) coarsely models natural evolution to evolve computer programs. Unlike in nature, where individuals can often improve their fitness through lifetime experience, the fitness of GP individuals generally does not change during their lifetime, and there is usually no opportunity to pass on acquired knowledge. This paper introduces the Chameleon system to address this discrepancy and augment GP with lifetime learning by adding a simple local search that operates by tuning the internal nodes of individuals. Although not the first attempt to combine local search with GP, its simplicity means that it is easy to understand and cheap to implement. A simple cache is added which leverages the local search to reduce the tuning cost to a small fraction of the expected cost, and we provide a theoretical upper limit on the maximum tuning expense given the average tree size of the population and show that this limit grows very conservatively as the average tree size of the population increases. We show that Chameleon uses available genetic material more efficiently by exploring more actively than with standard GP, and demonstrate that not only does Chameleon outperform standard GP (on both training and test data) over a number of symbolic regression type problems, it does so by producing smaller individuals and it works harmoniously with two other well-known extensions to GP, namely, linear scaling and a diversity-promoting tournament selection method.
Research on Fault Rate Prediction Method of T/R Component
NASA Astrophysics Data System (ADS)
Hou, Xiaodong; Yang, Jiangping; Bi, Zengjun; Zhang, Yu
2017-07-01
T/R component is an important part of the large phased array radar antenna array, because of its large numbers, high fault rate, it has important significance for fault prediction. Aiming at the problems of traditional grey model GM(1,1) in practical operation, the discrete grey model is established based on the original model in this paper, and the optimization factor is introduced to optimize the background value, and the linear form of the prediction model is added, the improved discrete grey model of linear regression is proposed, finally, an example is simulated and compared with other models. The results show that the method proposed in this paper has higher accuracy and the solution is simple and the application scope is more extensive.
NASA Astrophysics Data System (ADS)
Hajigeorgiou, Photos G.
2016-12-01
An analytical model for the diatomic potential energy function that was recently tested as a universal function (Hajigeorgiou, 2010) has been further modified and tested as a suitable model for direct-potential-fit analysis. Applications are presented for the ground electronic states of three diatomic molecules: oxygen, carbon monoxide, and hydrogen fluoride. The adjustable parameters of the extended Lennard-Jones potential model are determined through nonlinear regression by fits to calculated rovibrational energy term values or experimental spectroscopic line positions. The model is shown to lead to reliable, compact and simple representations for the potential energy functions of these systems and could therefore be classified as a suitable and attractive model for direct-potential-fit analysis.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Farsa, Oldřich
2013-01-01
The log BB parameter is the logarithm of the ratio of a compound’s equilibrium concentrations in the brain tissue versus the blood plasma. This parameter is a useful descriptor in assessing the ability of a compound to permeate the blood-brain barrier. The aim of this study was to develop a Hansch-type linear regression QSAR model that correlates the parameter log BB and the retention time of drugs and other organic compounds on a reversed-phase HPLC containing an embedded amide moiety. The retention time was expressed by the capacity factor log k′. The second aim was to estimate the brain’s absorption of 2-(azacycloalkyl)acetamidophenoxyacetic acids, which are analogues of piracetam, nefiracetam, and meclofenoxate. Notably, these acids may be novel nootropics. Two simple regression models that relate log BB and log k′ were developed from an assay performed using a reversed-phase HPLC that contained an embedded amide moiety. Both the quadratic and linear models yielded statistical parameters comparable to previously published models of log BB dependence on various structural characteristics. The models predict that four members of the substituted phenoxyacetic acid series have a strong chance of permeating the barrier and being absorbed in the brain. The results of this study show that a reversed-phase HPLC system containing an embedded amide moiety is a functional in vitro surrogate of the blood-brain barrier. These results suggest that racetam-type nootropic drugs containing a carboxylic moiety could be more poorly absorbed than analogues devoid of the carboxyl group, especially if the compounds penetrate the barrier by a simple diffusion mechanism. PMID:23641330
Vignali, Edda; Macchia, Enrico; Cetani, Filomena; Reggiardo, Giorgio; Cianferotti, Luisella; Saponaro, Federica; Marcocci, Claudio
2017-01-01
Sun exposure is the main determinant of vitamin D production. The aim of this study was to develop an algorithm to assess individual vitamin D status, independently of serum 25(OHD) measurement, using a simple questionnaire, mostly relying upon sunlight exposure, which might help select subjects requiring serum 25(OHD) measurement. Six hundred and twenty adult subjects living in a mountain village in Southern Italy, located at 954 m above the sea level and at a latitude of 40°50'11″76N, were asked to fill the questionnaire in two different periods of the year: August 2010 and March 2011. Seven predictors were considered: month of investigation, age, sex, BMI, average daily sunlight exposure, beach holidays in the past 12 months, and frequency of going outdoors. The statistical model assumes four classes of serum 25(OHD) concentrations: ≤10, 10-19.9, 20-29.9, and ≥30 ng/ml. The algorithm was developed using a two-step procedure. In Step 1, the linear regression equation was defined in 385 randomly selected subjects. In Step 2, the predictive ability of the regression model was tested in the remaining 235 subjects. Seasonality, daily sunlight exposure and beach holidays in the past 12 months accounted for 27.9, 13.5, and 6.4 % of the explained variance in predicting vitamin D status, respectively. The algorithm performed extremely well: 212 of 235 (90.2 %) subjects were assigned to the correct vitamin D status. In conclusion, our pilot study demonstrates that an algorithm to estimate the vitamin D status can be developed using a simple questionnaire based on sunlight exposure.
Álvarez, Daniel; Alonso-Álvarez, María L.; Gutiérrez-Tobal, Gonzalo C.; Crespo, Andrea; Kheirandish-Gozal, Leila; Hornero, Roberto; Gozal, David; Terán-Santos, Joaquín; Del Campo, Félix
2017-01-01
Study Objectives: Nocturnal oximetry has become known as a simple, readily available, and potentially useful diagnostic tool of childhood obstructive sleep apnea (OSA). However, at-home respiratory polygraphy (HRP) remains the preferred alternative to polysomnography (PSG) in unattended settings. The aim of this study was twofold: (1) to design and assess a novel methodology for pediatric OSA screening based on automated analysis of at-home oxyhemoglobin saturation (SpO2), and (2) to compare its diagnostic performance with HRP. Methods: SpO2 recordings were parameterized by means of time, frequency, and conventional oximetric measures. Logistic regression models were optimized using genetic algorithms (GAs) for three cutoffs for OSA: 1, 3, and 5 events/h. The diagnostic performance of logistic regression models, manual obstructive apnea-hypopnea index (OAHI) from HRP, and the conventional oxygen desaturation index ≥ 3% (ODI3) were assessed. Results: For a cutoff of 1 event/h, the optimal logistic regression model significantly outperformed both conventional HRP-derived ODI3 and OAHI: 85.5% accuracy (HRP 74.6%; ODI3 65.9%) and 0.97 area under the receiver operating characteristics curve (AUC) (HRP 0.78; ODI3 0.75) were reached. For a cutoff of 3 events/h, the logistic regression model achieved 83.4% accuracy (HRP 85.0%; ODI3 74.5%) and 0.96 AUC (HRP 0.93; ODI3 0.85) whereas using a cutoff of 5 events/h, oximetry reached 82.8% accuracy (HRP 85.1%; ODI3 76.7) and 0.97 AUC (HRP 0.95; ODI3 0.84). Conclusions: Automated analysis of at-home SpO2 recordings provide accurate detection of children with high pretest probability of OSA. Thus, unsupervised nocturnal oximetry may enable a simple and effective alternative to HRP and PSG in unattended settings. Citation: Álvarez D, Alonso-Álvarez ML, Gutiérrez-Tobal GC, Crespo A, Kheirandish-Gozal L, Hornero R, Gozal D, Terán-Santos J, Del Campo F. Automated screening of children with obstructive sleep apnea using nocturnal oximetry: an alternative to respiratory polygraphy in unattended settings. J Clin Sleep Med. 2017;13(5):693–702. PMID:28356177
LogCauchy, log-sech and lognormal distributions of species abundances in forest communities
Yin, Z.-Y.; Peng, S.-L.; Ren, H.; Guo, Q.; Chen, Z.-H.
2005-01-01
Species-abundance (SA) pattern is one of the most fundamental aspects of biological community structure, providing important information regarding species richness, species-area relation and succession. To better describe the SA distribution (SAD) in a community, based on the widely used lognormal (LN) distribution model with exp(-x2) roll-off on Preston's octave scale, this study proposed two additional models, logCauchy (LC) and log-sech (LS), respectively with roll-offs of simple x-2 and e-x. The estimation of the theoretical total number of species in the whole community, S*, including very rare species not yet collected in sample, was derived from the left-truncation of each distribution. We fitted these three models by Levenberg-Marquardt nonlinear regression and measured the model fit to the data using coefficient of determination of regression, parameters' t-test and distribution's Kolmogorov-Smirnov (KS) test. Examining the SA data from six forest communities (five in lower subtropics and one in tropics), we found that: (1) on a log scale, all three models that are bell-shaped and left-truncated statistically adequately fitted the observed SADs, and the LC and LS did better than the LN; (2) from each model and for each community the S* values estimated by the integral and summation methods were almost equal, allowing us to estimate S* using a simple integral formula and to estimate its asymptotic confidence internals by regression of a transformed model containing it; (3) following the order of LC, LS, and LN, the fitted distributions became lower in the peak, less concave in the side, and shorter in the tail, and overall the LC tended to overestimate, the LN tended to underestimate, while the LS was intermediate but slightly tended to underestimate, the observed SADs (particularly the number of common species in the right tail); (4) the six communities had some similar structural properties such as following similar distribution models, having a common modal octave and a similar proportion of common species. We suggested that what follows the LN distribution should follow (or better follow) the LC and LS, and that the LC, LS and LN distributions represent a "sequential distribution set" in which one can find a best fit to the observed SAD. ?? 2004 Elsevier B.V. All rights reserved.
Support vector regression methodology for estimating global solar radiation in Algeria
NASA Astrophysics Data System (ADS)
Guermoui, Mawloud; Rabehi, Abdelaziz; Gairaa, Kacem; Benkaciali, Said
2018-01-01
Accurate estimation of Daily Global Solar Radiation (DGSR) has been a major goal for solar energy applications. In this paper we show the possibility of developing a simple model based on the Support Vector Regression (SVM-R), which could be used to estimate DGSR on the horizontal surface in Algeria based only on sunshine ratio as input. The SVM model has been developed and tested using a data set recorded over three years (2005-2007). The data was collected at the Applied Research Unit for Renewable Energies (URAER) in Ghardaïa city. The data collected between 2005-2006 are used to train the model while the 2007 data are used to test the performance of the selected model. The measured and the estimated values of DGSR were compared during the testing phase statistically using the Root Mean Square Error (RMSE), Relative Square Error (rRMSE), and correlation coefficient (r2), which amount to 1.59(MJ/m2), 8.46 and 97,4%, respectively. The obtained results show that the SVM-R is highly qualified for DGSR estimation using only sunshine ratio.
Eutrophication risk assessment in coastal embayments using simple statistical models.
Arhonditsis, G; Eleftheriadou, M; Karydis, M; Tsirtsis, G
2003-09-01
A statistical methodology is proposed for assessing the risk of eutrophication in marine coastal embayments. The procedure followed was the development of regression models relating the levels of chlorophyll a (Chl) with the concentration of the limiting nutrient--usually nitrogen--and the renewal rate of the systems. The method was applied in the Gulf of Gera, Island of Lesvos, Aegean Sea and a surrogate for renewal rate was created using the Canberra metric as a measure of the resemblance between the Gulf and the oligotrophic waters of the open sea in terms of their physical, chemical and biological properties. The Chl-total dissolved nitrogen-renewal rate regression model was the most significant, accounting for 60% of the variation observed in Chl. Predicted distributions of Chl for various combinations of the independent variables, based on Bayesian analysis of the models, enabled comparison of the outcomes of specific scenarios of interest as well as further analysis of the system dynamics. The present statistical approach can be used as a methodological tool for testing the resilience of coastal ecosystems under alternative managerial schemes and levels of exogenous nutrient loading.
Temporal trends in sperm count: a systematic review and meta-regression analysis.
Levine, Hagai; Jørgensen, Niels; Martino-Andrade, Anderson; Mendiola, Jaime; Weksler-Derri, Dan; Mindlis, Irina; Pinotti, Rachel; Swan, Shanna H
2017-11-01
Reported declines in sperm counts remain controversial today and recent trends are unknown. A definitive meta-analysis is critical given the predictive value of sperm count for fertility, morbidity and mortality. To provide a systematic review and meta-regression analysis of recent trends in sperm counts as measured by sperm concentration (SC) and total sperm count (TSC), and their modification by fertility and geographic group. PubMed/MEDLINE and EMBASE were searched for English language studies of human SC published in 1981-2013. Following a predefined protocol 7518 abstracts were screened and 2510 full articles reporting primary data on SC were reviewed. A total of 244 estimates of SC and TSC from 185 studies of 42 935 men who provided semen samples in 1973-2011 were extracted for meta-regression analysis, as well as information on years of sample collection and covariates [fertility group ('Unselected by fertility' versus 'Fertile'), geographic group ('Western', including North America, Europe Australia and New Zealand versus 'Other', including South America, Asia and Africa), age, ejaculation abstinence time, semen collection method, method of measuring SC and semen volume, exclusion criteria and indicators of completeness of covariate data]. The slopes of SC and TSC were estimated as functions of sample collection year using both simple linear regression and weighted meta-regression models and the latter were adjusted for pre-determined covariates and modification by fertility and geographic group. Assumptions were examined using multiple sensitivity analyses and nonlinear models. SC declined significantly between 1973 and 2011 (slope in unadjusted simple regression models -0.70 million/ml/year; 95% CI: -0.72 to -0.69; P < 0.001; slope in adjusted meta-regression models = -0.64; -1.06 to -0.22; P = 0.003). The slopes in the meta-regression model were modified by fertility (P for interaction = 0.064) and geographic group (P for interaction = 0.027). There was a significant decline in SC between 1973 and 2011 among Unselected Western (-1.38; -2.02 to -0.74; P < 0.001) and among Fertile Western (-0.68; -1.31 to -0.05; P = 0.033), while no significant trends were seen among Unselected Other and Fertile Other. Among Unselected Western studies, the mean SC declined, on average, 1.4% per year with an overall decline of 52.4% between 1973 and 2011. Trends for TSC and SC were similar, with a steep decline among Unselected Western (-5.33 million/year, -7.56 to -3.11; P < 0.001), corresponding to an average decline in mean TSC of 1.6% per year and overall decline of 59.3%. Results changed minimally in multiple sensitivity analyses, and there was no statistical support for the use of a nonlinear model. In a model restricted to data post-1995, the slope both for SC and TSC among Unselected Western was similar to that for the entire period (-2.06 million/ml, -3.38 to -0.74; P = 0.004 and -8.12 million, -13.73 to -2.51, P = 0.006, respectively). This comprehensive meta-regression analysis reports a significant decline in sperm counts (as measured by SC and TSC) between 1973 and 2011, driven by a 50-60% decline among men unselected by fertility from North America, Europe, Australia and New Zealand. Because of the significant public health implications of these results, research on the causes of this continuing decline is urgently needed. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Logistic regression of family data from retrospective study designs.
Whittemore, Alice S; Halpern, Jerry
2003-11-01
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.
Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna
2017-09-05
The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
Snow, David P.
2016-01-01
This study investigates infants’ transition from nonverbal to verbal communication using evidence from regression patterns. As an example of regressions, prelinguistic infants learning American Sign Language (ASL) use pointing gestures to communicate. At the onset of single signs, however, these gestures disappear. Petitto (1987) attributed the regression to the children’s discovery that pointing has two functions, namely, deixis and linguistic pronouns. The 1:2 relation (1 form, 2 functions) violates the simple 1:1 pattern that infants are believed to expect. This kind of conflict, Petitto argued, explains the regression. Based on the additional observation that the regression coincided with the boundary between prelinguistic and linguistic communication, Petitto concluded that the prelinguistic and linguistic periods are autonomous. The purpose of the present study was to evaluate the 1:1 model and to determine whether it explains a previously reported regression of intonation in English. Background research showed that gestures and intonation have different forms but the same pragmatic meanings, a 2:1 form-function pattern that plausibly precipitates the regression. The hypothesis of the study was that gestures and intonation are closely related. Moreover, because gestures and intonation change in the opposite direction, the negative correlation between them indicates a robust inverse relationship. To test this prediction, speech samples of 29 infants (8 to 16 months) were analyzed acoustically and compared to parent-report data on several verbal and gestural scales. In support of the hypothesis, gestures alone were inversely correlated with intonation. In addition, the regression model explains nonlinearities stemming from different form-function configurations. However, the results failed to support the claim that regressions linked to early words or signs reflect autonomy. The discussion ends with a focus on the special role of intonation in children’s transition from “prelinguistic” communication to language. PMID:28729753
Evaluation of spatial, radiometric and spectral Thematic Mapper performance for coastal studies
NASA Technical Reports Server (NTRS)
Klemas, V. (Principal Investigator)
1984-01-01
The effect different wetland plant canopies have upon observed reflectance in Thematic Mapper bands is studied. The three major vegetation canopy types (broadleaf, gramineous and leafless) produce unique spectral responses for a similar quantity of live biomass. The spectral biomass estimate of a broadleaf canopy is most similar to the harvest biomass estimate when a broadleaf canopy radiance model is used. All major wetland vegetation species can be identified through TM imagery. Simple regression models are developed equating the vegetation index and the infrared index with biomass. The spectral radiance index largely agreed with harvest biomass estimates.
Simple Patchy-Based Simulators Used to Explore Pondscape Systematic Dynamics
Fang, Wei-Ta; Chou, Jui-Yu; Lu, Shiau-Yun
2014-01-01
Thousands of farm ponds disappeared on the tableland in Taoyuan County, Taiwan since 1920s. The number of farm ponds that have disappeared is 1,895 (37%), 2,667 ponds remain (52%), and only 537 (11%) new ponds were created within a 757 km2 area in Taoyuan, Taiwan between 1926 and 1960. In this study, a geographic information system (GIS) and logistic stepwise regression model were used to detect pond-loss rates and to understand the driving forces behind pondscape changes. The logistic stepwise regression model was used to develop a series of relationships between pondscapes affected by intrinsic driving forces (patch size, perimeter, and patch shape) and external driving forces (distance from the edge of the ponds to the edges of roads, rivers, and canals). The authors concluded that the loss of ponds was caused by pond intrinsic factors, such as pond perimeter; a large perimeter increases the chances of pond loss, but also increases the possibility of creating new ponds. However, a large perimeter is closely associated with circular shapes (lower value of the mean pond-patch fractal dimension [MPFD]), which characterize the majority of newly created ponds. The method used in this study might be helpful to those seeking to protect this unique landscape by enabling the monitoring of patch-loss problems by using simple patchy-based simulators. PMID:24466281
Ono, Daiki; Bamba, Takeshi; Oku, Yuichi; Yonetani, Tsutomu; Fukusaki, Eiichiro
2011-09-01
In this study, we constructed prediction models by metabolic fingerprinting of fresh green tea leaves using Fourier transform near-infrared (FT-NIR) spectroscopy and partial least squares (PLS) regression analysis to objectively optimize of the steaming process conditions in green tea manufacture. The steaming process is the most important step for manufacturing high quality green tea products. However, the parameter setting of the steamer is currently determined subjectively by the manufacturer. Therefore, a simple and robust system that can be used to objectively set the steaming process parameters is necessary. We focused on FT-NIR spectroscopy because of its simple operation, quick measurement, and low running costs. After removal of noise in the spectral data by principal component analysis (PCA), PLS regression analysis was performed using spectral information as independent variables, and the steaming parameters set by experienced manufacturers as dependent variables. The prediction models were successfully constructed with satisfactory accuracy. Moreover, the results of the demonstrated experiment suggested that the green tea steaming process parameters could be predicted on a larger manufacturing scale. This technique will contribute to improvement of the quality and productivity of green tea because it can objectively optimize the complicated green tea steaming process and will be suitable for practical use in green tea manufacture. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Cuttability Assessment of Selected Rocks Through Different Brittleness Values
NASA Astrophysics Data System (ADS)
Dursun, Arif Emre; Gokay, M. Kemal
2016-04-01
Prediction of cuttability is a critical issue for successful execution of tunnel or mining excavation projects. Rock cuttability is also used to determine specific energy, which is defined as the work done by the cutting force to excavate a unit volume of yield. Specific energy is a meaningful inverse measure of cutting efficiency, since it simply states how much energy must be expended to excavate a unit volume of rock. Brittleness is a fundamental rock property and applied in drilling and rock excavation. Brittleness is one of the most crucial rock features for rock excavation. For this reason, determination of relations between cuttability and brittleness will help rock engineers. This study aims to estimate the specific energy from different brittleness values of rocks by means of simple and multiple regression analyses. In this study, rock cutting, rock property, and brittleness index tests were carried out on 24 different rock samples with different strength values, including marble, travertine, and tuff, collected from sites around Konya Province, Turkey. Four previously used brittleness concepts were evaluated in this study, denoted as B 1 (ratio of compressive to tensile strength), B 2 (ratio of the difference between compressive and tensile strength to the sum of compressive and tensile strength), B 3 (area under the stress-strain line in relation to compressive and tensile strength), and B 9 = S 20, the percentage of fines (<11.2 mm) formed in an impact test for the Norwegian University of Science and Technology (NTNU) model as well as B 9p (B 9 as predicted from uniaxial compressive, Brazilian tensile, and point load strengths of rocks using multiple regression analysis). The results suggest that the proposed simple regression-based prediction models including B 3, B 9, and B 9p outperform the other models including B 1 and B 2 and can be used for more accurate and reliable estimation of specific energy.
Schleicher, Rosemary L; Sternberg, Maya R; Pfeiffer, Christine M
2013-06-01
Sociodemographic and lifestyle factors exert important influences on nutritional status; however, information on their association with biomarkers of fat-soluble nutrients is limited, particularly in a representative sample of adults. Serum or plasma concentrations of vitamin A, vitamin E, carotenes, xanthophylls, 25-hydroxyvitamin D [25(OH)D], SFAs, MUFAs, PUFAs, and total fatty acids (tFAs) were measured in adults (aged ≥ 20 y) during all or part of NHANES 2003-2006. Simple and multiple linear regression models were used to assess 5 sociodemographic variables (age, sex, race-ethnicity, education, and income) and 5 lifestyle behaviors (smoking, alcohol consumption, BMI, physical activity, and supplement use) and their relation to biomarker concentrations. Adjustment for total serum cholesterol and lipid-altering drug use was added to the full regression model. Adjustment for latitude and season was added to the full model for 25(OH)D. Based on simple linear regression, race-ethnicity, BMI, and supplement use were significantly related to all fat-soluble biomarkers. Sociodemographic variables as a group explained 5-17% of biomarker variability, whereas together, sociodemographic and lifestyle variables explained 22-23% [25(OH)D, vitamin E, xanthophylls], 17% (vitamin A), 15% (MUFAs), 10-11% (SFAs, carotenes, tFAs), and 6% (PUFAs) of biomarker variability. Although lipid adjustment explained additional variability for all biomarkers except for 25(OH)D, it appeared to be largely independent of sociodemographic and lifestyle variables. After adjusting for sociodemographic, lifestyle, and lipid-related variables, major differences in biomarkers were associated with race-ethnicity (from -44 to 57%), smoking (up to -25%), supplement use (up to 21%), and BMI (up to -15%). Latitude and season attenuated some race-ethnicity differences. Of the sociodemographic and lifestyle variables examined, with or without lipid adjustment, most fat-soluble nutrient biomarkers were significantly associated with race-ethnicity.
Brady, Christopher John; Mudie, Lucy Iluka; Wang, Xueyang; Guallar, Eliseo; Friedman, David Steven
2017-06-20
Diabetic retinopathy (DR) is a leading cause of vision loss in working age individuals worldwide. While screening is effective and cost effective, it remains underutilized, and novel methods are needed to increase detection of DR. This clinical validation study compared diagnostic gradings of retinal fundus photographs provided by volunteers on the Amazon Mechanical Turk (AMT) crowdsourcing marketplace with expert-provided gold-standard grading and explored whether determination of the consensus of crowdsourced classifications could be improved beyond a simple majority vote (MV) using regression methods. The aim of our study was to determine whether regression methods could be used to improve the consensus grading of data collected by crowdsourcing. A total of 1200 retinal images of individuals with diabetes mellitus from the Messidor public dataset were posted to AMT. Eligible crowdsourcing workers had at least 500 previously approved tasks with an approval rating of 99% across their prior submitted work. A total of 10 workers were recruited to classify each image as normal or abnormal. If half or more workers judged the image to be abnormal, the MV consensus grade was recorded as abnormal. Rasch analysis was then used to calculate worker ability scores in a random 50% training set, which were then used as weights in a regression model in the remaining 50% test set to determine if a more accurate consensus could be devised. Outcomes of interest were the percent correctly classified images, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) for the consensus grade as compared with the expert grading provided with the dataset. Using MV grading, the consensus was correct in 75.5% of images (906/1200), with 75.5% sensitivity, 75.5% specificity, and an AUROC of 0.75 (95% CI 0.73-0.78). A logistic regression model using Rasch-weighted individual scores generated an AUROC of 0.91 (95% CI 0.88-0.93) compared with 0.89 (95% CI 0.86-92) for a model using unweighted scores (chi-square P value<.001). Setting a diagnostic cut-point to optimize sensitivity at 90%, 77.5% (465/600) were graded correctly, with 90.3% sensitivity, 68.5% specificity, and an AUROC of 0.79 (95% CI 0.76-0.83). Crowdsourced interpretations of retinal images provide rapid and accurate results as compared with a gold-standard grading. Creating a logistic regression model using Rasch analysis to weight crowdsourced classifications by worker ability improves accuracy of aggregated grades as compared with simple majority vote. ©Christopher John Brady, Lucy Iluka Mudie, Xueyang Wang, Eliseo Guallar, David Steven Friedman. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.06.2017.
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
Uechi, Ken; Asakura, Keiko; Ri, Yui; Masayasu, Shizuko; Sasaki, Satoshi
2016-02-01
Several estimation methods for 24-h sodium excretion using spot urine sample have been reported, but accurate estimation at the individual level remains difficult. We aimed to clarify the most accurate method of estimating 24-h sodium excretion with different numbers of available spot urine samples. A total of 370 participants from throughout Japan collected multiple 24-h urine and spot urine samples independently. Participants were allocated randomly into a development and a validation dataset. Two estimation methods were established in the development dataset using the two 24-h sodium excretion samples as reference: the 'simple mean method' estimated by multiplying the sodium-creatinine ratio by predicted 24-h creatinine excretion, whereas the 'regression method' employed linear regression analysis. The accuracy of the two methods was examined by comparing the estimated means and concordance correlation coefficients (CCC) in the validation dataset. Mean sodium excretion by the simple mean method with three spot urine samples was closest to that by 24-h collection (difference: -1.62 mmol/day). CCC with the simple mean method increased with an increased number of spot urine samples at 0.20, 0.31, and 0.42 using one, two, and three samples, respectively. This method with three spot urine samples yielded higher CCC than the regression method (0.40). When only one spot urine sample was available for each study participant, CCC was higher with the regression method (0.36). The simple mean method with three spot urine samples yielded the most accurate estimates of sodium excretion. When only one spot urine sample was available, the regression method was preferable.
Spertus, Jacob V; Normand, Sharon-Lise T
2018-04-23
High-dimensional data provide many potential confounders that may bolster the plausibility of the ignorability assumption in causal inference problems. Propensity score methods are powerful causal inference tools, which are popular in health care research and are particularly useful for high-dimensional data. Recent interest has surrounded a Bayesian treatment of propensity scores in order to flexibly model the treatment assignment mechanism and summarize posterior quantities while incorporating variance from the treatment model. We discuss methods for Bayesian propensity score analysis of binary treatments, focusing on modern methods for high-dimensional Bayesian regression and the propagation of uncertainty. We introduce a novel and simple estimator for the average treatment effect that capitalizes on conjugacy of the beta and binomial distributions. Through simulations, we show the utility of horseshoe priors and Bayesian additive regression trees paired with our new estimator, while demonstrating the importance of including variance from the treatment regression model. An application to cardiac stent data with almost 500 confounders and 9000 patients illustrates approaches and facilitates comparison with existing alternatives. As measured by a falsifiability endpoint, we improved confounder adjustment compared with past observational research of the same problem. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bilgili, Mehmet; Sahin, Besir; Sangun, Levent
2013-01-01
The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade
2015-12-01
The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Aragón, Alfredo S; Kalberg, Wendy O; Buckley, David; Barela-Scott, Lindsey M; Tabachnick, Barbara G; May, Philip A
2008-12-01
Although a large body of literature exists on cognitive functioning in alcohol-exposed children, it is unclear if there is a signature neuropsychological profile in children with Fetal Alcohol Spectrum Disorders (FASD). This study assesses cognitive functioning in children with FASD from several American Indian reservations in the Northern Plains States, and it applies a hierarchical model of simple versus complex information processing to further examine cognitive function. We hypothesized that complex tests would discriminate between children with FASD and culturally similar controls, while children with FASD would perform similar to controls on relatively simple tests. Our sample includes 32 control children and 24 children with a form of FASD [fetal alcohol syndrome (FAS) = 10, partial fetal alcohol syndrome (PFAS) = 14]. The test battery measures general cognitive ability, verbal fluency, executive functioning, memory, and fine-motor skills. Many of the neuropsychological tests produced results consistent with a hierarchical model of simple versus complex processing. The complexity of the tests was determined "a priori" based on the number of cognitive processes involved in them. Multidimensional scaling was used to statistically analyze the accuracy of classifying the neurocognitive tests into a simple versus complex dichotomy. Hierarchical logistic regression models were then used to define the contribution made by complex versus simple tests in predicting the significant differences between children with FASD and controls. Complex test items discriminated better than simple test items. The tests that conformed well to the model were the Verbal Fluency, Progressive Planning Test (PPT), the Lhermitte memory tasks, and the Grooved Pegboard Test (GPT). The FASD-grouped children, when compared with controls, demonstrated impaired performance on letter fluency, while their performance was similar on category fluency. On the more complex PPT trials (problems 5 to 8), as well as the Lhermitte logical tasks, the FASD group performed the worst. The differential performance between children with FASD and controls was evident across various neuropsychological measures. The children with FASD performed significantly more poorly on the complex tasks than did the controls. The identification of a neurobehavioral profile in children with prenatal alcohol exposure will help clinicians identify and diagnose children with FASD.
Komaroff, Marina
2016-01-01
The aim of this study is to investigate if weight fluctuation is an independent risk factor for postmenopausal breast cancer (PBC) among women who gained weight in adult years. NHANES I Epidemiologic Follow-Up Study (NHEFS) database was used in the study. Women that were cancers-free at enrollment and diagnosed for the first time with breast cancer at age 50 or greater were considered cases. Controls were chosen from the subset of cancers-free women and matched to cases by years of follow-up and status of body mass index (BMI) at 25 years of age. Weight fluctuation was measured by the root-mean-square-error (RMSE) from a simple linear regression model for each woman with their body mass index (BMI) regressed on age (started at 25 years) while women with the positive slope from this regression were defined as weight gainers. Data were analyzed using conditional logistic regression models. A total of 158 women were included into the study. The conditional logistic regression adjusted for weight gain demonstrated positive association between weight fluctuation in adult years and postmenopausal breast cancers (odds ratio/OR = 1.67; 95% confidence interval/CI: 1.06-2.66). The data suggested that long-term weight fluctuation was significant risk factor for PBC among women who gained weight in adult years. This finding underscores the importance of maintaining lost weight and avoiding weight fluctuation.
Komaroff, Marina
2016-01-01
Objective. The aim of this study is to investigate if weight fluctuation is an independent risk factor for postmenopausal breast cancer (PBC) among women who gained weight in adult years. Methods. NHANES I Epidemiologic Follow-Up Study (NHEFS) database was used in the study. Women that were cancers-free at enrollment and diagnosed for the first time with breast cancer at age 50 or greater were considered cases. Controls were chosen from the subset of cancers-free women and matched to cases by years of follow-up and status of body mass index (BMI) at 25 years of age. Weight fluctuation was measured by the root-mean-square-error (RMSE) from a simple linear regression model for each woman with their body mass index (BMI) regressed on age (started at 25 years) while women with the positive slope from this regression were defined as weight gainers. Data were analyzed using conditional logistic regression models. Results. A total of 158 women were included into the study. The conditional logistic regression adjusted for weight gain demonstrated positive association between weight fluctuation in adult years and postmenopausal breast cancers (odds ratio/OR = 1.67; 95% confidence interval/CI: 1.06–2.66). Conclusions. The data suggested that long-term weight fluctuation was significant risk factor for PBC among women who gained weight in adult years. This finding underscores the importance of maintaining lost weight and avoiding weight fluctuation. PMID:26953120
Deriving the Regression Equation without Using Calculus
ERIC Educational Resources Information Center
Gordon, Sheldon P.; Gordon, Florence S.
2004-01-01
Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Modeling of bromate formation by ozonation of surface waters in drinking water treatment.
Legube, Bernard; Parinet, Bernard; Gelinet, Karine; Berne, Florence; Croue, Jean-Philippe
2004-04-01
The main objective of this paper is to try to develop statistically and chemically rational models for bromate formation by ozonation of clarified surface waters. The results presented here show that bromate formation by ozonation of natural waters in drinking water treatment is directly proportional to the "Ct" value ("Ctau" in this study). Moreover, this proportionality strongly depends on many parameters: increasing of pH, temperature and bromide level leading to an increase of bromate formation; ammonia and dissolved organic carbon concentrations causing a reverse effect. Taking into account limitation of theoretical modeling, we proposed to predict bromate formation by stochastic simulations (multi-linear regression and artificial neural networks methods) from 40 experiments (BrO(3)(-) vs. "Ctau") carried out with three sand filtered waters sampled on three different waterworks. With seven selected variables we used a simple architecture of neural networks, optimized by "neural connection" of SPSS Inc./Recognition Inc. The bromate modeling by artificial neural networks gives better result than multi-linear regression. The artificial neural networks model allowed us classifying variables by decreasing order of influence (for the studied cases in our variables scale): "Ctau", [N-NH(4)(+)], [Br(-)], pH, temperature, DOC, alkalinity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, J.; Moon, T.J.; Howell, J.R.
This paper presents an analysis of the heat transfer occurring during an in-situ curing process for which infrared energy is provided on the surface of polymer composite during winding. The material system is Hercules prepreg AS4/3501-6. Thermoset composites have an exothermic chemical reaction during the curing process. An Eulerian thermochemical model is developed for the heat transfer analysis of helical winding. The model incorporates heat generation due to the chemical reaction. Several assumptions are made leading to a two-dimensional, thermochemical model. For simplicity, 360{degree} heating around the mandrel is considered. In order to generate the appropriate process windows, the developedmore » heat transfer model is combined with a simple winding time model. The process windows allow for a proper selection of process variables such as infrared energy input and winding velocity to give a desired end-product state. Steady-state temperatures are found for each combination of the process variables. A regression analysis is carried out to relate the process variables to the resulting steady-state temperatures. Using regression equations, process windows for a wide range of cylinder diameters are found. A general procedure to find process windows for Hercules AS4/3501-6 prepreg tape is coded in a FORTRAN program.« less
Jarnevich, Catherine S.; Talbert, Marian; Morisette, Jeffrey T.; Aldridge, Cameron L.; Brown, Cynthia; Kumar, Sunil; Manier, Daniel; Talbert, Colin; Holcombe, Tracy R.
2017-01-01
Evaluating the conditions where a species can persist is an important question in ecology both to understand tolerances of organisms and to predict distributions across landscapes. Presence data combined with background or pseudo-absence locations are commonly used with species distribution modeling to develop these relationships. However, there is not a standard method to generate background or pseudo-absence locations, and method choice affects model outcomes. We evaluated combinations of both model algorithms (simple and complex generalized linear models, multivariate adaptive regression splines, Maxent, boosted regression trees, and random forest) and background methods (random, minimum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivity of model outcomes to choices made. We evaluated six questions related to model results, including five beyond the common comparison of model accuracy assessment metrics (biological interpretability of response curves, cross-validation robustness, independent data accuracy and robustness, and prediction consistency). For our case study with cheatgrass in the western US, random forest was least sensitive to background choice and the binary KDE method was least sensitive to model algorithm choice. While this outcome may not hold for other locations or species, the methods we used can be implemented to help determine appropriate methodologies for particular research questions.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
NASA Astrophysics Data System (ADS)
Suhandy, D.; Yulia, M.; Ogawa, Y.; Kondo, N.
2018-05-01
In the present research, an evaluation of using near infrared (NIR) spectroscopy in tandem with full spectrum partial least squares (FS-PLS) regression for quantification of degree of adulteration in civet coffee was conducted. A number of 126 ground roasted coffee samples with degree of adulteration 0-51% were prepared. Spectral data were acquired using a NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement in the range of 1300-2500 nm. The samples were divided into two groups calibration sample set (84 samples) and prediction sample set (42 samples). The calibration model was developed on original spectra using FS-PLS regression with full-cross validation method. The calibration model exhibited the determination coefficient R2=0.96 for calibration and R2=0.92 for validation. The prediction resulted in low root mean square error of prediction (RMSEP) (4.67%) and high ratio prediction to deviation (RPD) (3.75). In conclusion, the degree of adulteration in civet coffee have been quantified successfully by using NIR spectroscopy and FS-PLS regression in a non-destructive, economical, precise, and highly sensitive method, which uses very simple sample preparation.
Herrero, A M; de la Hoz, L; Ordóñez, J A; Herranz, B; Romero de Ávila, M D; Cambero, M I
2008-11-01
The possibilities of using breaking strength (BS) and energy to fracture (EF) for monitoring textural properties of some cooked meat sausages (chopped, mortadella and galantines) were studied. Texture profile analysis (TPA), folding test and physico-chemical measurements were also performed. Principal component analysis enabled these meat products to be grouped into three textural profiles which showed significant (p<0.05) differences mainly for BS, hardness, adhesiveness and cohesiveness. Multivariate analysis indicated that BS, EF and TPA parameters were correlated (p<0.05) for every individual meat product (chopped, mortadella and galantines) and all products together. On the basis of these results, TPA parameters could be used for constructing regression models to predict BS. The resulting regression model for all cooked meat products was BS=-0.160+6.600∗cohesiveness-1.255∗adhesiveness+0.048∗hardness-506.31∗springiness (R(2)=0.745, p<0.00005). Simple linear regression analysis showed significant coefficients of determination between BS (R(2)=0.586, p<0.0001) versus folding test grade (FG) and EF versus FG (R(2)=0.564, p<0.0001).
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Hosking, Diane E; Nettelbeck, Ted; Wilson, Carlene; Danthiir, Vanessa
2014-07-28
Dietary intake is a modifiable exposure that may have an impact on cognitive outcomes in older age. The long-term aetiology of cognitive decline and dementia, however, suggests that the relevance of dietary intake extends across the lifetime. In the present study, we tested whether retrospective dietary patterns from the life periods of childhood, early adulthood, adulthood and middle age predicted cognitive performance in a cognitively healthy sample of 352 older Australian adults >65 years. Participants completed the Lifetime Diet Questionnaire and a battery of cognitive tests designed to comprehensively assess multiple cognitive domains. In separate regression models, lifetime dietary patterns were the predictors of cognitive factor scores representing ten constructs derived by confirmatory factor analysis of the cognitive test battery. All regression models were progressively adjusted for the potential confounders of current diet, age, sex, years of education, English as native language, smoking history, income level, apoE ɛ4 status, physical activity, other past dietary patterns and health-related variables. In the adjusted models, lifetime dietary patterns predicted cognitive performance in this sample of older adults. In models additionally adjusted for intake from the other life periods and mechanistic health-related variables, dietary patterns from the childhood period alone reached significance. Higher consumption of the 'coffee and high-sugar, high-fat extras' pattern predicted poorer performance on simple/choice reaction time, working memory, retrieval fluency, short-term memory and reasoning. The 'vegetable and non-processed' pattern negatively predicted simple/choice reaction time, and the 'traditional Australian' pattern positively predicted perceptual speed and retrieval fluency. Identifying early-life dietary antecedents of older-age cognitive performance contributes to formulating strategies for delaying or preventing cognitive decline.
NASA Astrophysics Data System (ADS)
Huntington, T. G.; Shanley, J. B.
2015-12-01
The relation between constituent concentrations and stream discharge (C/Q relations) are fundamental to the estimation of fluxes or loads in biogeochemical studies. C/Q relations are useful for understanding nutrient, trace element, and contaminant behavior in response to storm and snowmelt related changes in discharge. The shape and seasonal variation of C/Q relations provides information about availability, mobilization, and release of solutes to streams. The properties of C/Q relations can allude to flowpaths, antecedent moisture conditions, and solute availability. Changes in C/Q relations over time for certain constituents like dissolved organic carbon (DOC) may be indicative of changes in supply that may have resulted from changes in climate, vegetation, or land use and land cover. The focus of this presentation is on a simple method for detection of change in C/Q relations using the LOADEST regression model. The LOADEST model fits a seasonally variable C/Q relation to discrete water quality data. For a continuously gauged stream or river, a relatively long record of C/Q data can be partitioned into distinct periods and regression models can be determined for each period. By running each model with the same discharge record and subsequently plotting each flux time series, differences between models can be visualized graphically. Plotting differences between periods (models) illustrates at what times of year the differences are largest. Running each model with a range of discharges for each day of the year provides additional insight into whether the changes in C/Q relations are evident at all levels of discharge or only at specific levels of discharge. The DOC record (1991 to 2014) from a research watershed at Sleepers River in Vermont was used in this analysis. The analysis showed that there have been increases in DOC concentration for certain seasons and rates of discharge.
Power and sample size for multivariate logistic modeling of unmatched case-control studies.
Gail, Mitchell H; Haneuse, Sebastien
2017-01-01
Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we outline the theory needed to detect scalar exposure effects or scalar interactions while controlling for other covariates in logistic regression. Both analytical and simulation methods are presented, together with links to the corresponding software.
Moralidis, Efstratios; Spyridonidis, Tryfon; Arsos, Georgios; Skeberis, Vassilios; Anagnostopoulos, Constantinos; Gavrielidis, Stavros
2010-01-01
This study aimed to determine systolic dysfunction and estimate resting left ventricular ejection fraction (LVEF) from information collected during routine evaluation of patients with suspected or known coronary heart disease. This approach was then compared to gated single photon emission tomography (SPET). Patients having undergone stress (201)Tl myocardial perfusion imaging followed by equilibrium radionuclide angiography (ERNA) were separated into derivation (n=954) and validation (n=309) groups. Logistic regression analysis was used to develop scoring systems, containing clinical, electrocardiographic (ECG) and scintigraphic data, for the discrimination of an ERNA-LVEF<0.50. Linear regression analysis provided equations predicting ERNA-LVEF from those scores. In 373 patients LVEF was also assessed with (201)Tl gated SPET. Our results showed that an ECG-Scintigraphic scoring system was the best simple predictor of an ERNA-LVEF<0.50 in comparison to other models including ECG, clinical and scintigraphic variables in both the derivation and validation subpopulations. A simple linear equation was derived also for the assessment of resting LVEF from the ECG-Scintigraphic model. Equilibrium radionuclide angiography-LVEF had a good correlation with the ECG-Scintigraphic model LVEF (r=0.716, P=0.000), (201)Tl gated SPET LVEF (r=0.711, P=0.000) and the average LVEF from those assessments (r=0.796, P=0.000). The Bland-Altman statistic (mean+/-2SD) provided values of 0.001+/-0.176, 0.071+/-0.196 and 0.040+/-0.152, respectively. The average LVEF was a better discriminator of systolic dysfunction than gated SPET-LVEF in receiver operating characteristic (ROC) analysis and identified more patients (89%) with a =10% difference from ERNA-LVEF than gated SPET (65%, P=0.000). In conclusion, resting left ventricular systolic dysfunction can be determined effectively from simple resting ECG and stress myocardial perfusion imaging variables. This model provides reliable LVEF estimations, comparable to those from (201)Tl gated SPET, and can enhance the clinical performance of the latter.
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
Solar Modulation of Inner Trapped Belt Radiation Flux as a Function of Atmospheric Density
NASA Technical Reports Server (NTRS)
Lodhi, M. A. K.
2005-01-01
No simple algorithm seems to exist for calculating proton fluxes and lifetimes in the Earth's inner, trapped radiation belt throughout the solar cycle. Most models of the inner trapped belt in use depend upon AP8 which only describes the radiation environment at solar maximum and solar minimum in Cycle 20. One exception is NOAAPRO which incorporates flight data from the TIROS/NOAA polar orbiting spacecraft. The present study discloses yet another, simple formulation for approximating proton fluxes at any time in a given solar cycle, in particular between solar maximum and solar minimum. It is derived from AP8 using a regression algorithm technique from nuclear physics. From flux and its time integral fluence, one can then approximate dose rate and its time integral dose.
Brügemann, K; Gernand, E; von Borstel, U U; König, S
2011-08-01
Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Yulia, Meinilwita
2017-01-01
Asian palm civet coffee or kopi luwak (Indonesian words for coffee and palm civet) is well known as the world's priciest and rarest coffee. To protect the authenticity of luwak coffee and protect consumer from luwak coffee adulteration, it is very important to develop a robust and simple method for determining the adulteration of luwak coffee. In this research, the use of UV-Visible spectra combined with PLSR was evaluated to establish rapid and simple methods for quantification of adulteration in luwak-arabica coffee blend. Several preprocessing methods were tested and the results show that most of the preprocessing spectra were effective in improving the quality of calibration models with the best PLS calibration model selected for Savitzky-Golay smoothing spectra which had the lowest RMSECV (0.039) and highest RPDcal value (4.64). Using this PLS model, a prediction for quantification of luwak content was calculated and resulted in satisfactory prediction performance with high both RPDp and RER values. PMID:28913348
Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li
2013-01-21
A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.
Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P
2014-06-26
To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
Assessment of ecologic regression in the study of lung cancer and indoor radon.
Stidley, C A; Samet, J M
1994-02-01
Ecologic regression studies conducted to assess the cancer risk of indoor radon to the general population are subject to methodological limitations, and they have given seemingly contradictory results. The authors use simulations to examine the effects of two major methodological problems that affect these studies: measurement error and misspecification of the risk model. In a simulation study of the effect of measurement error caused by the sampling process used to estimate radon exposure for a geographic unit, both the effect of radon and the standard error of the effect estimate were underestimated, with greater bias for smaller sample sizes. In another simulation study, which addressed the consequences of uncontrolled confounding by cigarette smoking, even small negative correlations between county geometric mean annual radon exposure and the proportion of smokers resulted in negative average estimates of the radon effect. A third study considered consequences of using simple linear ecologic models when the true underlying model relation between lung cancer and radon exposure is nonlinear. These examples quantify potential biases and demonstrate the limitations of estimating risks from ecologic studies of lung cancer and indoor radon.
Development of statistical linear regression model for metals from transportation land uses.
Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung
2009-01-01
The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Wiley, J.B.; Atkins, John T.; Tasker, Gary D.
2000-01-01
Multiple and simple least-squares regression models for the log10-transformed 100-year discharge with independent variables describing the basin characteristics (log10-transformed and untransformed) for 267 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions of the State, designated East, North, and South. Exploratory data analysis procedures identified 31 gaging stations at which discharges are different than would be expected for West Virginia. Regional equations for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year peak discharges were determined by generalized least-squares regression using data from 236 gaging stations. Log10-transformed drainage area was the most significant independent variable for all regions.Equations developed in this study are applicable only to rural, unregulated, streams within the boundaries of West Virginia. The accuracy of estimating equations is quantified by measuring the average prediction error (from 27.7 to 44.7 percent) and equivalent years of record (from 1.6 to 20.0 years).
Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.
Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J
2015-02-01
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Carder, K. L.; Lee, Z. P.; Marra, John; Steward, R. G.; Perry, M. J.
1995-01-01
The quantum yield of photosynthesis (mol C/mol photons) was calculated at six depths for the waters of the Marine Light-Mixed Layer (MLML) cruise of May 1991. As there were photosynthetically available radiation (PAR) but no spectral irradiance measurements for the primary production incubations, three ways are presented here for the calculation of the absorbed photons (AP) by phytoplankton for the purpose of calculating phi. The first is based on a simple, nonspectral model; the second is based on a nonlinear regression using measured PAR values with depth; and the third is derived through remote sensing measurements. We show that the results of phi calculated using the nonlinear regreesion method and those using remote sensing are in good agreement with each other, and are consistent with the reported values of other studies. In deep waters, however, the simple nonspectral model may cause quantum yield values much higher than theoretically possible.
Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki
2018-04-01
Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
Probabilistic forecasting for extreme NO2 pollution episodes.
Aznarte, José L
2017-10-01
In this study, we investigate the convenience of quantile regression to predict extreme concentrations of NO 2 . Contrarily to the usual point-forecasting, where a single value is forecast for each horizon, probabilistic forecasting through quantile regression allows for the prediction of the full probability distribution, which in turn allows to build models specifically fit for the tails of this distribution. Using data from the city of Madrid, including NO 2 concentrations as well as meteorological measures, we build models that predict extreme NO 2 concentrations, outperforming point-forecasting alternatives, and we prove that the predictions are accurate, reliable and sharp. Besides, we study the relative importance of the independent variables involved, and show how the important variables for the median quantile are different than those important for the upper quantiles. Furthermore, we present a method to compute the probability of exceedance of thresholds, which is a simple and comprehensible manner to present probabilistic forecasts maximizing their usefulness. Copyright © 2017 Elsevier Ltd. All rights reserved.
Regression to fuzziness method for estimation of remaining useful life in power plant components
NASA Astrophysics Data System (ADS)
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
Likhvantseva, V G; Sokolov, V A; Levanova, O N; Kovelenova, I V
2018-01-01
Prediction of the clinical course of primary open-angle glaucoma (POAG) is one of the main directions in solving the problem of vision loss prevention and stabilization of the pathological process. Simple statistical methods of correlation analysis show the extent of each risk factor's impact, but do not indicate the total impact of these factors in personalized combinations. The relationships between the risk factors is subject to correlation and regression analysis. The regression equation represents the dependence of the mathematical expectation of the resulting sign on the combination of factor signs. To develop a technique for predicting the probability of development and progression of primary open-angle glaucoma based on a personalized combination of risk factors by linear multivariate regression analysis. The study included 66 patients (23 female and 43 male; 132 eyes) with newly diagnosed primary open-angle glaucoma. The control group consisted of 14 patients (8 male and 6 female). Standard ophthalmic examination was supplemented with biochemical study of lacrimal fluid. Concentration of matrix metalloproteinase MMP-2 and MMP-9 in tear fluid in both eyes was determined using 'sandwich' enzyme-linked immunosorbent assay (ELISA) method. The study resulted in the development of regression equations and step-by-step multivariate logistic models that can help calculate the risk of development and progression of POAG. Those models are based on expert evaluation of clinical and instrumental indicators of hydrodynamic disturbances (coefficient of outflow ease - C, volume of intraocular fluid secretion - F, fluctuation of intraocular pressure), as well as personalized morphometric parameters of the retina (central retinal thickness in the macular area) and concentration of MMP-2 and MMP-9 in the tear film. The newly developed regression equations are highly informative and can be a reliable tool for studying of the influence vector and assessment of pathogenic potential of the independent risk factors in specific personalized combinations.
Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
Accounting for groundwater in stream fish thermal habitat responses to climate change
Snyder, Craig D.; Hitt, Nathaniel P.; Young, John A.
2015-01-01
Forecasting climate change effects on aquatic fauna and their habitat requires an understanding of how water temperature responds to changing air temperature (i.e., thermal sensitivity). Previous efforts to forecast climate effects on brook trout habitat have generally assumed uniform air-water temperature relationships over large areas that cannot account for groundwater inputs and other processes that operate at finer spatial scales. We developed regression models that accounted for groundwater influences on thermal sensitivity from measured air-water temperature relationships within forested watersheds in eastern North America (Shenandoah National Park, USA, 78 sites in 9 watersheds). We used these reach-scale models to forecast climate change effects on stream temperature and brook trout thermal habitat, and compared our results to previous forecasts based upon large-scale models. Observed stream temperatures were generally less sensitive to air temperature than previously assumed, and we attribute this to the moderating effect of shallow groundwater inputs. Predicted groundwater temperatures from air-water regression models corresponded well to observed groundwater temperatures elsewhere in the study area. Predictions of brook trout future habitat loss derived from our fine-grained models were far less pessimistic than those from prior models developed at coarser spatial resolutions. However, our models also revealed spatial variation in thermal sensitivity within and among catchments resulting in a patchy distribution of thermally suitable habitat. Habitat fragmentation due to thermal barriers therefore may have an increasingly important role for trout population viability in headwater streams. Our results demonstrate that simple adjustments to air-water temperature regression models can provide a powerful and cost-effective approach for predicting future stream temperatures while accounting for effects of groundwater.
Albuquerque, F S; Peso-Aguiar, M C; Assunção-Albuquerque, M J T; Gálvez, L
2009-08-01
The length-weight relationship and condition factor have been broadly investigated in snails to obtain the index of physical condition of populations and evaluate habitat quality. Herein, our goal was to describe the best predictors that explain Achatina fulica biometrical parameters and well being in a recently introduced population. From November 2001 to November 2002, monthly snail samples were collected in Lauro de Freitas City, Bahia, Brazil. Shell length and total weight were measured in the laboratory and the potential curve and condition factor were calculated. Five environmental variables were considered: temperature range, mean temperature, humidity, precipitation and human density. Multiple regressions were used to generate models including multiple predictors, via model selection approach, and then ranked with AIC criteria. Partial regressions were used to obtain the separated coefficients of determination of climate and human density models. A total of 1.460 individuals were collected, presenting a shell length range between 4.8 to 102.5 mm (mean: 42.18 mm). The relationship between total length and total weight revealed that Achatina fulica presented a negative allometric growth. Simple regression indicated that humidity has a significant influence on A. fulica total length and weight. Temperature range was the main variable that influenced the condition factor. Multiple regressions showed that climatic and human variables explain a small proportion of the variance in shell length and total weight, but may explain up to 55.7% of the condition factor variance. Consequently, we believe that the well being and biometric parameters of A. fulica can be influenced by climatic and human density factors.
Estimation of water quality by UV/Vis spectrometry in the framework of treated wastewater reuse.
Carré, Erwan; Pérot, Jean; Jauzein, Vincent; Lin, Liming; Lopez-Ferber, Miguel
2017-07-01
The aim of this study is to investigate the potential of ultraviolet/visible (UV/Vis) spectrometry as a complementary method for routine monitoring of reclaimed water production. Robustness of the models and compliance of their sensitivity with current quality limits are investigated. The following indicators are studied: total suspended solids (TSS), turbidity, chemical oxygen demand (COD) and nitrate. Partial least squares regression (PLSR) is used to find linear correlations between absorbances and indicators of interest. Artificial samples are made by simulating a sludge leak on the wastewater treatment plant and added to the original dataset, then divided into calibration and prediction datasets. The models are built on the calibration set, and then tested on the prediction set. The best models are developed with: PLSR for COD (R pred 2 = 0.80), TSS (R pred 2 = 0.86) and turbidity (R pred 2 = 0.96), and with a simple linear regression from absorbance at 208 nm (R pred 2 = 0.95) for nitrate concentration. The input of artificial data significantly enhances the robustness of the models. The sensitivity of the UV/Vis spectrometry monitoring system developed is compatible with quality requirements of reclaimed water production processes.
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
NASA Astrophysics Data System (ADS)
Baasch, Benjamin; Müller, Hendrik; von Dobeneck, Tilo; Oberle, Ferdinand K. J.
2017-05-01
The electric conductivity and magnetic susceptibility of sediments are fundamental parameters in environmental geophysics. Both can be derived from marine electromagnetic profiling, a novel, fast and non-invasive seafloor mapping technique. Here we present statistical evidence that electric conductivity and magnetic susceptibility can help to determine physical grain-size characteristics (size, sorting and mud content) of marine surficial sediments. Electromagnetic data acquired with the bottom-towed electromagnetic profiler MARUM NERIDIS III were analysed and compared with grain size data from 33 samples across the NW Iberian continental shelf. A negative correlation between mean grain size and conductivity (R=-0.79) as well as mean grain size and susceptibility (R=-0.78) was found. Simple and multiple linear regression analyses were carried out to predict mean grain size, mud content and the standard deviation of the grain-size distribution from conductivity and susceptibility. The comparison of both methods showed that multiple linear regression models predict the grain-size distribution characteristics better than the simple models. This exemplary study demonstrates that electromagnetic benthic profiling is capable to estimate mean grain size, sorting and mud content of marine surficial sediments at a very high significance level. Transfer functions can be calibrated using grains-size data from a few reference samples and extrapolated along shelf-wide survey lines. This study suggests that electromagnetic benthic profiling should play a larger role for coastal zone management, seafloor contamination and sediment provenance studies in worldwide continental shelf systems.
Svenning, J.-C.; Engelbrecht, B.M.J.; Kinner, D.A.; Kursar, T.A.; Stallard, R.F.; Wright, S.J.
2006-01-01
We used regression models and information-theoretic model selection to assess the relative importance of environment, local dispersal and historical contingency as controls of the distributions of 26 common plant species in tropical forest on Barro Colorado Island (BCI), Panama. We censused eighty-eight 0.09-ha plots scattered across the landscape. Environmental control, local dispersal and historical contingency were represented by environmental variables (soil moisture, slope, soil type, distance to shore, old-forest presence), a spatial autoregressive parameter (??), and four spatial trend variables, respectively. We built regression models, representing all combinations of the three hypotheses, for each species. The probability that the best model included the environmental variables, spatial trend variables and ?? averaged 33%, 64% and 50% across the study species, respectively. The environmental variables, spatial trend variables, ??, and a simple intercept model received the strongest support for 4, 15, 5 and 2 species, respectively. Comparing the model results to information on species traits showed that species with strong spatial trends produced few and heavy diaspores, while species with strong soil moisture relationships were particularly drought-sensitive. In conclusion, history and local dispersal appeared to be the dominant controls of the distributions of common plant species on BCI. Copyright ?? 2006 Cambridge University Press.
Yavari, Reza; McEntee, Erin; McEntee, Michael; Brines, Michael
2011-01-01
The current world-wide epidemic of obesity has stimulated interest in developing simple screening methods to identify individuals with undiagnosed diabetes mellitus type 2 (DM2) or metabolic syndrome (MS). Prior work utilizing body composition obtained by sophisticated technology has shown that the ratio of abdominal fat to total fat is a good predictor for DM2 or MS. The goals of this study were to determine how well simple anthropometric variables predict the fat mass distribution as determined by dual energy x-ray absorptometry (DXA), and whether these are useful to screen for DM2 or MS within a population. To accomplish this, the body composition of 341 females spanning a wide range of body mass indices and with a 23% prevalence of DM2 and MS was determined using DXA. Stepwise linear regression models incorporating age, weight, height, waistline, and hipline predicted DXA body composition (i.e., fat mass, trunk fat, fat free mass, and total mass) with good accuracy. Using body composition as independent variables, nominal logistic regression was then performed to estimate the probability of DM2. The results show good discrimination with the receiver operating characteristic (ROC) having an area under the curve (AUC) of 0.78. The anthropometrically-derived body composition equations derived from the full DXA study group were then applied to a group of 1153 female patients selected from a general endocrinology practice. Similar to the smaller study group, the ROC from logistical regression using body composition had an AUC of 0.81 for the detection of DM2. These results are superior to screening based on questionnaires and compare favorably with published data derived from invasive testing, e.g., hemoglobin A1c. This anthropometric approach offers promise for the development of simple, inexpensive, non-invasive screening to identify individuals with metabolic dysfunction within large populations. PMID:21915276
Regression model for estimating inactivation of microbial aerosols by solar radiation.
Ben-David, Avishai; Sagripanti, Jose-Luis
2013-01-01
The inactivation of pathogenic aerosols by solar radiation is relevant to public health and biodefense. We investigated whether a relatively simple method to calculate solar diffuse and total irradiances could be developed and used in environmental photobiology estimations instead of complex atmospheric radiative transfer computer programs. The second-order regression model that we developed reproduced 13 radiation quantities calculated for equinoxes and solstices at 35(°) latitude with a computer-intensive and rather complex atmospheric radiative transfer program (MODTRAN) with a mean error <6% (2% for most radiation quantities). Extending the application of the regression model from a reference latitude and date (chosen as 35° latitude for 21 March) to different latitudes and days of the year was accomplished with variable success: usually with a mean error <15% (but as high as 150% for some combination of latitudes and days of year). This accuracy of the methodology proposed here compares favorably to photobiological experiments where the microbial survival is usually measured with an accuracy no better than ±0.5 log10 units. The approach and equations presented in this study should assist in estimating the maximum time during which microbial pathogens remain infectious after accidental or intentional aerosolization in open environments. © Published 2013. This article is a U.S. Government work and is in the public domain in the USA. Photochemistry and Photobiology © 2013 The American Society of Photobiology.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Explicit criteria for prioritization of cataract surgery
Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia
2006-01-01
Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893
Remote sensing of PM2.5 from ground-based optical measurements
NASA Astrophysics Data System (ADS)
Li, S.; Joseph, E.; Min, Q.
2014-12-01
Remote sensing of particulate matter concentration with aerodynamic diameter smaller than 2.5 um(PM2.5) by using ground-based optical measurements of aerosols is investigated based on 6 years of hourly average measurements of aerosol optical properties, PM2.5, ceilometer backscatter coefficients and meteorological factors from Howard University Beltsville Campus facility (HUBC). The accuracy of quantitative retrieval of PM2.5 using aerosol optical depth (AOD) is limited due to changes in aerosol size distribution and vertical distribution. In this study, ceilometer backscatter coefficients are used to provide vertical information of aerosol. It is found that the PM2.5-AOD ratio can vary largely for different aerosol vertical distributions. The ratio is also sensitive to mode parameters of bimodal lognormal aerosol size distribution when the geometric mean radius for the fine mode is small. Using two Angstrom exponents calculated at three wavelengths of 415, 500, 860nm are found better representing aerosol size distributions than only using one Angstrom exponent. A regression model is proposed to assess the impacts of different factors on the retrieval of PM2.5. Compared to a simple linear regression model, the new model combining AOD and ceilometer backscatter can prominently improve the fitting of PM2.5. The contribution of further introducing Angstrom coefficients is apparent. Using combined measurements of AOD, ceilometer backscatter, Angstrom coefficients and meteorological parameters in the regression model can get a correlation coefficient of 0.79 between fitted and expected PM2.5.
Bounding species distribution models
Stohlgren, T.J.; Jarnevich, C.S.; Esaias, W.E.; Morisette, J.T.
2011-01-01
Species distribution models are increasing in popularity for mapping suitable habitat for species of management concern. Many investigators now recognize that extrapolations of these models with geographic information systems (GIS) might be sensitive to the environmental bounds of the data used in their development, yet there is no recommended best practice for "clamping" model extrapolations. We relied on two commonly used modeling approaches: classification and regression tree (CART) and maximum entropy (Maxent) models, and we tested a simple alteration of the model extrapolations, bounding extrapolations to the maximum and minimum values of primary environmental predictors, to provide a more realistic map of suitable habitat of hybridized Africanized honey bees in the southwestern United States. Findings suggest that multiple models of bounding, and the most conservative bounding of species distribution models, like those presented here, should probably replace the unbounded or loosely bounded techniques currently used. ?? 2011 Current Zoology.
Bounding Species Distribution Models
NASA Technical Reports Server (NTRS)
Stohlgren, Thomas J.; Jarnevich, Cahterine S.; Morisette, Jeffrey T.; Esaias, Wayne E.
2011-01-01
Species distribution models are increasing in popularity for mapping suitable habitat for species of management concern. Many investigators now recognize that extrapolations of these models with geographic information systems (GIS) might be sensitive to the environmental bounds of the data used in their development, yet there is no recommended best practice for "clamping" model extrapolations. We relied on two commonly used modeling approaches: classification and regression tree (CART) and maximum entropy (Maxent) models, and we tested a simple alteration of the model extrapolations, bounding extrapolations to the maximum and minimum values of primary environmental predictors, to provide a more realistic map of suitable habitat of hybridized Africanized honey bees in the southwestern United States. Findings suggest that multiple models of bounding, and the most conservative bounding of species distribution models, like those presented here, should probably replace the unbounded or loosely bounded techniques currently used [Current Zoology 57 (5): 642-647, 2011].
A Simultaneous Equation Demand Model for Block Rates
NASA Astrophysics Data System (ADS)
Agthe, Donald E.; Billings, R. Bruce; Dobra, John L.; Raffiee, Kambiz
1986-01-01
This paper examines the problem of simultaneous-equations bias in estimation of the water demand function under an increasing block rate structure. The Hausman specification test is used to detect the presence of simultaneous-equations bias arising from correlation of the price measures with the regression error term in the results of a previously published study of water demand in Tucson, Arizona. An alternative simultaneous equation model is proposed for estimating the elasticity of demand in the presence of block rate pricing structures and availability of service charges. This model is used to reestimate the price and rate premium elasticities of demand in Tucson, Arizona for both the usual long-run static model and for a simple short-run demand model. The results from these simultaneous equation models are consistent with a priori expectations and are unbiased.
NASA Astrophysics Data System (ADS)
Molotch, N. P.; Painter, T. H.; Bales, R. C.; Dozier, J.
2003-04-01
In this study, an accumulated net radiation / accumulated degree-day index snowmelt model was coupled with remotely sensed snow covered area (SCA) data to simulate snow cover depletion and reconstruct maximum snow water equivalent (SWE) in the 19.1-km2 Tokopah Basin of the Sierra Nevada, California. Simple net radiation snowmelt models are attractive for operational snowmelt runoff forecasts as they are computationally inexpensive and have low input requirements relative to physically based energy balance models. The objective of this research was to assess the accuracy of a simple net radiation snowmelt model in a topographically heterogeneous alpine environment. Previous applications of net radiation / temperature index snowmelt models have not been evaluated in alpine terrain with intensive field observations of SWE. Solar radiation data from two meteorological stations were distributed using the topographic radiation model TOPORAD. Relative humidity and temperature data were distributed based on the lapse rate calculated between three meteorological stations within the basin. Fractional SCA data from the Landsat Enhanced Thematic Mapper (5 acquisitions) and the Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) (2 acquisitions) were used to derive daily SCA using a linear regression between acquisition dates. Grain size data from AVIRIS (4 acquisitions) were used to infer snow surface albedo and interpolated linearly with time to derive daily albedo values. Modeled daily snowmelt rates for each 30-m pixel were scaled by the SCA and integrated over the snowmelt season to obtain estimates of maximum SWE accumulation. Snow surveys consisting of an average of 335 depth measurements and 53 density measurements during April, May and June, 1997 were interpolated using a regression tree / co-krig model, with independent variables of average incoming solar radiation, elevation, slope and maximum upwind slope. The basin was clustered into 7 elevation / average-solar-radiation zones for SWE accuracy assessment. Model simulations did a poor job at estimating the spatial distribution of SWE. Basin clusters where the solar radiative flux dominated the melt flux were simulated more accurately than those dominated by the turbulent fluxes or the longwave radiative flux.
A Simple Introduction to Moving Least Squares and Local Regression Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garimella, Rao Veerabhadra
In this brief note, a highly simpli ed introduction to esimating functions over a set of particles is presented. The note starts from Global Least Squares tting, going on to Moving Least Squares estimation (MLS) and nally, Local Regression Estimation (LRE).
Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman
2011-01-01
This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2007-01-01
Sustained increases in energy prices have focused attention on gas resources in low permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are large. Planning and development decisions for extraction of such resources must be area-wide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm the decision to enter such plays depends on reconnaissance level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional scale cost functions. The context of the worked example is the Devonian Antrim shale gas play, Michigan Basin. One finding relates to selection of the resource prediction model to be used with economic models. Models which can best predict aggregate volume over larger areas (many hundreds of sites) may lose granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined by extraneous factors. The paper also shows that when these simple prediction models are used to strategically order drilling prospects, the gain in gas volume over volumes associated with simple random site selection amounts to 15 to 20 percent. It also discusses why the observed benefit of updating predictions from results of new drilling, as opposed to following static predictions, is somewhat smaller. Copyright 2007, Society of Petroleum Engineers.
Esteki, M; Nouroozi, S; Shahsavari, Z
2016-02-01
To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 μg mL(-1) for MP, and from 0.1 to 25 μg mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was carried out with satisfactory results (84-112%). The proposed method, which is an environmentally friendly approach, using minimum amount of solvent, is a simple, fast and low-cost analysis method that can provide high accuracy and robust models. The suggested method does not need any complex extraction procedure which is time-consuming and requires hazardous solvents. © 2015 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
NASA Astrophysics Data System (ADS)
Ke, Haohao; Ondov, John M.; Rogge, Wolfgang F.
2013-12-01
Composite chemical profiles of motor vehicle emissions were extracted from ambient measurements at a near-road site in Baltimore during a windless traffic episode in November, 2002, using four independent approaches, i.e., simple peak analysis, windless model-based linear regression, PMF, and UNMIX. Although the profiles are in general agreement, the windless-model-based profile treatment more effectively removes interference from non-traffic sources and is deemed to be more accurate for many species. In addition to abundances of routine pollutants (e.g., NOx, CO, PM2.5, EC, OC, sulfate, and nitrate), 11 particle-bound metals and 51 individual traffic-related organic compounds (including n-alkanes, PAHs, oxy-PAHs, hopanes, alkylcyclohexanes, and others) were included in the modeling.
Development of a simplified urban water balance model (WABILA).
Henrichs, M; Langner, J; Uhl, M
2016-01-01
During the last decade, water sensitive urban design (WSUD) has become more and more accepted. However, there is not any simple tool or option available to evaluate the influence of these measures on the local water balance. To counteract the impact of new settlements, planners focus on mitigating increases in runoff through installation of infiltration systems. This leads to an increasing non-natural groundwater recharge and decreased evapotranspiration. Simple software tools which evaluate or simulate the effect of WSUD on the local water balance are still needed. The authors developed a tool named WABILA (Wasserbilanz) that could support planners for optimal WSUD. WABILA is an easy-to-use planning tool that is based on simplified regression functions for established measures and land covers. Results show that WSUD has to be site-specific, based on climate conditions and the natural water balance.
Additivity of nonlinear biomass equations
Bernard R. Parresol
2001-01-01
Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
Modeling of Micro Deval abrasion loss based on some rock properties
NASA Astrophysics Data System (ADS)
Capik, Mehmet; Yilmaz, Ali Osman
2017-10-01
Aggregate is one of the most widely used construction material. The quality of the aggregate is determined using some testing methods. Among these methods, the Micro Deval Abrasion Loss (MDAL) test is commonly used for the determination of the quality and the abrasion resistance of aggregate. The main objective of this study is to develop models for the prediction of MDAL from rock properties, including uniaxial compressive strength, Brazilian tensile strength, point load index, Schmidt rebound hardness, apparent porosity, void ratio Cerchar abrasivity index and Bohme abrasion test are examined. Additionally, the MDAL is modeled using simple regression analysis and multiple linear regression analysis based on the rock properties. The study shows that the MDAL decreases with the increase of uniaxial compressive strength, Brazilian tensile strength, point load index, Schmidt rebound hardness and Cerchar abrasivity index. It is also concluded that the MDAL increases with the increase of apparent porosity, void ratio and Bohme abrasion test. The modeling results show that the models based on Bohme abrasion test and L type Schmidt rebound hardness give the better forecasting performances for the MDAL. More models, including the uniaxial compressive strength, the apparent porosity and Cerchar abrasivity index, are developed for the rapid estimation of the MDAL of the rocks. The developed models were verified by statistical tests. Additionally, it can be stated that the proposed models can be used as a forecasting for aggregate quality.
Satisfaction of active duty soldiers with family dental care.
Chisick, M C
1997-02-01
In the fall of 1992, a random, worldwide sample of 6,442 married and single parent soldiers completed a self-administered survey on satisfaction with 22 attributes of family dental care. Simple descriptive statistics for each attribute were derived, as was a composite overall satisfaction score using factor analysis. Composite scores were regressed on demographics, annual dental utilization, and access barriers to identify those factors having an impact on a soldier's overall satisfaction with family dental care. Separate regression models were constructed for single parents, childless couples, and couples with children. Results show below-average satisfaction with nearly all attributes of family dental care, with access attributes having the lowest average satisfaction scores. Factors influencing satisfaction with family dental care varied by family type with one exception: dependent dental utilization within the past year contributed positively to satisfaction across all family types.
Study of relationship between clinical factors and velopharyngeal closure in cleft palate patients
Chen, Qi; Zheng, Qian; Shi, Bing; Yin, Heng; Meng, Tian; Zheng, Guang-ning
2011-01-01
BACKGROUND: This study was carried out to analyze the relationship between clinical factors and velopharyngeal closure (VPC) in cleft palate patients. METHODS: Chi-square test was used to compare the postoperative velopharyngeal closure rate. Logistic regression model was used to analyze independent variables associated with velopharyngeal closure. RESULTS: Difference of postoperative VPC rate in different cleft types, operative ages and surgical techniques was significant (P=0.000). Results of logistic regression analysis suggested that when operative age was beyond deciduous dentition stage, or cleft palate type was complete, or just had undergone a simple palatoplasty without levator veli palatini retropositioning, patients would suffer a higher velopharyngeal insufficiency rate after primary palatal repair. CONCLUSIONS: Cleft type, operative age and surgical technique were the contributing factors influencing VPC rate after primary palatal repair of cleft palate patients. PMID:22279464
Predictive modelling of flow in a two-dimensional intermediate-scale, heterogeneous porous media
Barth, Gilbert R.; Hill, M.C.; Illangasekare, T.H.; Rajaram, H.
2000-01-01
To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.
Schutten, J; Davy, A J
2000-06-01
Aquatic macrophytes are important in stabilising moderately eutrophic, shallow freshwater lakes in the clear-water state. The failure of macrophyte recovery in lakes with very soft, highly organic sediments that have been restored to clear water by biomanipulation (e.g. in the Norfolk Broads, UK) has suggested that the physical stability of the sediment may limit plant establishment. Hydraulic forces from water currents may be sufficient to break or remove plants. Our aim was to develop a simple model that could predict these forces from plant biomass, current velocity and plant form. We used an experimental flume to measure the hydraulic forces acting on shoots of 18 species of aquatic macrophyte of varying size and morphology. The hydraulic drag on the shoots was regressed on a theoretically derived predictor (shoot biomass × current velocity 1.5 ). Such linear regressions proved to be highly significant for most species. The slopes of these lines represent species-specific, hydraulic roughness factors that are analogous to classical drag coefficients. Shoot architecture parameters describing leaf and shoot shape had significant effects on the hydraulic roughness factor. Leaf width and shoot stiffness individually did not have a significant influence, but in combination with shoot shape they were significant. This hydraulic model was validated for a subset of species using measurements from an independent set of shoots. When measured and predicted hydraulic forces were compared, the fit was generally very good, except for two species with morphological variations. This simple model, together with the plant-specific factors, provides a basis for predicting the hydraulic forces acting on the root systems of macrophytes under field conditions. This information should allow prediction of the physical stability of individual plants, as an aid to shallow-lake management.
Nieuwenhuijsen, Karen; Verbeek, Jos H A M; de Boer, Angela G E M; Blonk, Roland W B; van Dijk, Frank J H
2006-02-01
This study attempted to determine the factors that best predict the duration of absence from work among employees with common mental disorders. A cohort of 188 employees, of whom 102 were teachers, on sick leave with common mental disorders was followed for 1 year. Only information potentially available to the occupational physician during a first consultation was included in the predictive model. The predictive power of the variables was tested using Cox's regression analysis with a stepwise backward selection procedure. The hazard ratios (HR) from the final model were used to deduce a simple prediction rule. The resulting prognostic scores were then used to predict the probability of not returning to work after 3, 6, and 12 months. Calculating the area under the curve from the ROC (receiver operating characteristic) curve tested the discriminative ability of the prediction rule. The final Cox's regression model produced the following four predictors of a longer time until return to work: age older than 50 years [HR 0.5, 95% confidence interval (95% CI) 0.3-0.8], expectation of duration absence longer than 3 months (HR 0.5, 95% CI 0.3-0.8), higher educational level (HR 0.5, 95% CI 0.3-0.8), and diagnosis depression or anxiety disorder (HR 0.7, 95% CI 0.4-0.9). The resulting prognostic score yielded areas under the curves ranging from 0.68 to 0.73, which represent acceptable discrimination of the rule. A prediction rule based on four simple variables can be used by occupational physicians to identify unfavorable cases and to predict the duration of sickness absence.
Koneczny, Jarosław; Czekierdowski, Artur; Florczak, Marek; Poziemski, Paweł; Stachowicz, Norbert; Borowski, Dariusz
2017-01-01
Sonography based methods with various tumor markers are currently used to discriminate the type of adnexal masses. To compare the predictive value of selected sonography-based models along with subjective assessment in ovarian cancer prediction. We analyzed data of 271 women operated because of adnexal masses. All masses were verified by histological examination. Preoperative sonography was performed in all patients and various predictive models includ¬ing IOTA group logistic regression model LR1 (LR1), IOTA simple ultrasound-based rules by IOTA (SR), GI-RADS and risk of malignancy index (RMI3) were used. ROC curves were constructed and respective AUC's with 95% CI's were compared. Of 271 masses 78 proved to be malignant including 6 borderline tumors. LR1 had sensitivity of 91.0%, specificity of 91.2%, AUC = 0.95 (95% CI: 0.92-0.98). Sensitivity for GI-RADS for 271 patients was 88.5% with specificity of 85% and AUC = 0.91 (95% CI: 0.88-0.95). Subjective assessment yielded sensitivity and specificity of 85.9% and 96.9%, respectively with AUC = 0.97 (95% CI: 0.94-0.99). SR were applicable in 236 masses and had sensitivity of 90.6% with specificity of 95.3% and AUC = 0.93 (95% CI 0.89-0.97). RMI3 was calculated only in 104 women who had CA125 available and had sensitivity of 55.3%, specificity of 94% and AUC = 0.85 (95% CI: 0.77-0.93). Although subjective assessment by the ultrasound expert remains the best current method of adnexal tumors preoperative discrimination, the simplicity and high predictive value favor the IOTA SR method, and when not applicable, the IOTA LR1 or GI-RADS models to be primarily and effectively used.
A population-based study on the association between rheumatoid arthritis and voice problems.
Hah, J Hun; An, Soo-Youn; Sim, Songyong; Kim, So Young; Oh, Dong Jun; Park, Bumjung; Kim, Sung-Gyun; Choi, Hyo Geun
2016-07-01
The objective of this study was to investigate whether rheumatoid arthritis increases the frequency of organic laryngeal lesions and the subjective voice complaint rate in those with no organic laryngeal lesion. We performed a cross-sectional study using the data from 19,368 participants (418 rheumatoid arthritis patients and 18,950 controls) of the 2008-2011 Korea National Health and Nutrition Examination Survey. The associations between rheumatoid arthritis and organic laryngeal lesions/subjective voice complaints were analyzed using simple/multiple logistic regression analysis with complex sample adjusting for confounding factors, including age, sex, smoking status, stress level, and body mass index, which could provoke voice problems. Vocal nodules, vocal polyp, and vocal palsy were not associated with rheumatoid arthritis in a multiple regression analysis, and only laryngitis showed a positive association (adjusted odds ratio, 1.59; 95 % confidence interval, 1.01-2.52; P = 0.047). Rheumatoid arthritis was associated with subjective voice discomfort in a simple regression analysis, but not in a multiple regression analysis. Participants with rheumatoid arthritis were older, more often female, and had higher stress levels than those without rheumatoid arthritis. These factors were associated with subjective voice complaints in both simple and multiple regression analyses. Rheumatoid arthritis was not associated with organic laryngeal diseases except laryngitis. Rheumatoid arthritis did not increase the odds ratio for subjective voice complaints. Voice problems in participants with rheumatoid arthritis originated from the characteristics of the rheumatoid arthritis group (higher mean age, female sex, and stress level) rather than rheumatoid arthritis itself.
The Mantel-Haenszel procedure revisited: models and generalizations.
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.
The Mantel-Haenszel Procedure Revisited: Models and Generalizations
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Linking land cover and water quality in New York City's water supply watersheds.
Mehaffey, M H; Nash, M S; Wade, T G; Ebert, D W; Jones, K B; Rager, A
2005-08-01
The Catskill/Delaware reservoirs supply 90% of New York City's drinking water. The City has implemented a series of watershed protection measures, including land acquisition, aimed at preserving water quality in the Catskill/Delaware watersheds. The objective of this study was to examine how relationships between landscape and surface water measurements change between years. Thirty-two drainage areas delineated from surface water sample points (total nitrogen, total phosphorus, and fecal coliform bacteria concentrations) were used in step-wise regression analyses to test landscape and surface-water quality relationships. Two measurements of land use, percent agriculture and percent urban development, were positively related to water quality and consistently present in all regression models. Together these two land uses explained 25 to 75% of the regression model variation. However, the contribution of agriculture to water quality condition showed a decreasing trend with time as overall agricultural land cover decreased. Results from this study demonstrate that relationships between land cover and surface water concentrations of total nitrogen, total phosphorus, and fecal coliform bacteria counts over a large area can be evaluated using a relatively simple geographic information system method. Land managers may find this method useful for targeting resources in relation to a particular water quality concern, focusing best management efforts, and maximizing benefits to water quality with minimal costs.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Rispo, Antonio; Imperatore, Nicola; Testa, Anna; Bucci, Luigi; Luglio, Gaetano; De Palma, Giovanni Domenico; Rea, Matilde; Nardone, Olga Maria; Caporaso, Nicola; Castiglione, Fabiana
2018-03-08
In the management of Crohn's Disease (CD) patients, having a simple score combining clinical, endoscopic and imaging features to predict the risk of surgery could help to tailor treatment more effectively. AIMS: to prospectively evaluate the one-year risk factors for surgery in refractory/severe CD and to generate a risk matrix for predicting the probability of surgery at one year. CD patients needing a disease re-assessment at our tertiary IBD centre underwent clinical, laboratory, endoscopy and bowel sonography (BS) examinations within one week. The optimal cut-off values in predicting surgery were identified using ROC curves for Simple Endoscopic Score for CD (SES-CD), bowel wall thickness (BWT) at BS, and small bowel CD extension at BS. Binary logistic regression and Cox's regression were then carried out. Finally, the probabilities of surgery were calculated for selected baseline levels of covariates and results were arranged in a prediction matrix. Of 100 CD patients, 30 underwent surgery within one year. SES-CD©9 (OR 15.3; p<0.001), BWT©7 mm (OR 15.8; p<0.001), small bowel CD extension at BS©33 cm (OR 8.23; p<0.001) and stricturing/penetrating behavior (OR 4.3; p<0.001) were the only independent factors predictive of surgery at one-year based on binary logistic and Cox's regressions. Our matrix model combined these risk factors and the probability of surgery ranged from 0.48% to 87.5% (sixteen combinations). Our risk matrix combining clinical, endoscopic and ultrasonographic findings can accurately predict the one-year risk of surgery in patients with severe/refractory CD requiring a disease re-evaluation. This tool could be of value in clinical practice, serving as the basis for a tailored management of CD patients.
Schleicher, Rosemary L; Sternberg, Maya R; Pfeiffer, Christine M
2016-01-01
Sociodemographic and lifestyle factors exert important influences on nutritional status; however, information on their association with biomarkers of fat-soluble nutrients is limited, particularly in a representative sample of adults. Serum or plasma concentrations of vitamin A (VIA), vitamin E (VIE), carotenes (CAR), xanthophylls (XAN), 25-hydroxyvitamin D (25OHD), saturated- (SFA), monounsaturated- (MUFA), polyunsaturated- (PUFA) and total fatty acids (tFA) were measured in adults (≥20 y) during all or part of NHANES 2003–2006. Simple and multiple linear regression were used to assess 5 sociodemographic variables (age, sex, race-ethnicity, education, income) and 5 lifestyle behaviors (smoking, alcohol consumption, BMI, physical activity, supplement use) and their relation to biomarker concentrations. Adjustment for total serum cholesterol and lipid-altering drug use was added to the full regression model. Adjustment for latitude and season was added to the full model for 25OHD. Based on simple linear regression, race-ethnicity, BMI and supplement use were significantly related to all fat-soluble biomarkers. Sociodemographic variables as a groupexplained 5–17% of biomarker variability, whereas together, sociodemographic and lifestyle variables explained 22–23% (25OHD, VIE, XAN), 17% (VIA), 15% (MUFA), 10–11% (SFA, CAR, tFA) and 6% (PUFA). Although lipid adjustment explained additional variability for all biomarkers except 25OHD, it appeared to be largely independent of sociodemographic and lifestyle variables. After adjusting for sociodemographic, lifestyle and lipid-related variables, major differences in biomarkers were associated with race-ethnicity (from −44% to 57%); smoking (up to −25%); supplement use (up to 21%); and BMI (up to −15%). Latitude and season attenuated some race-ethnic differences. Of the sociodemographic and lifestyle variables examined, with or without lipid-adjustment, most fat-soluble nutrient biomarkers were significantly associated with race-ethnicity. PMID:23596163
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
Berglund, Lars; Garmo, Hans; Lindbäck, Johan; Svärdsudd, Kurt; Zethelius, Björn
2008-09-30
The least-squares estimator of the slope in a simple linear regression model is biased towards zero when the predictor is measured with random error. A corrected slope may be estimated by adding data from a reliability study, which comprises a subset of subjects from the main study. The precision of this corrected slope depends on the design of the reliability study and estimator choice. Previous work has assumed that the reliability study constitutes a random sample from the main study. A more efficient design is to use subjects with extreme values on their first measurement. Previously, we published a variance formula for the corrected slope, when the correction factor is the slope in the regression of the second measurement on the first. In this paper we show that both designs improve by maximum likelihood estimation (MLE). The precision gain is explained by the inclusion of data from all subjects for estimation of the predictor's variance and by the use of the second measurement for estimation of the covariance between response and predictor. The gain of MLE enhances with stronger true relationship between response and predictor and with lower precision in the predictor measurements. We present a real data example on the relationship between fasting insulin, a surrogate marker, and true insulin sensitivity measured by a gold-standard euglycaemic insulin clamp, and simulations, where the behavior of profile-likelihood-based confidence intervals is examined. MLE was shown to be a robust estimator for non-normal distributions and efficient for small sample situations. Copyright (c) 2008 John Wiley & Sons, Ltd.
Robust scoring functions for protein-ligand interactions with quantum chemical charge models.
Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J
2011-10-24
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
NASA Astrophysics Data System (ADS)
Walawender, Ewelina; Walawender, Jakub P.; Ustrnul, Zbigniew
2017-02-01
The main purpose of the study is to introduce methods for mapping the spatial distribution of the occurrence of selected atmospheric phenomena (thunderstorms, fog, glaze and rime) over Poland from 1966 to 2010 (45 years). Limited in situ observations as well the discontinuous and location-dependent nature of these phenomena make traditional interpolation inappropriate. Spatially continuous maps were created with the use of geospatial predictive modelling techniques. For each given phenomenon, an algorithm identifying its favourable meteorological and environmental conditions was created on the basis of observations recorded at 61 weather stations in Poland. Annual frequency maps presenting the probability of a day with a thunderstorm, fog, glaze or rime were created with the use of a modelled, gridded dataset by implementing predefined algorithms. Relevant explanatory variables were derived from NCEP/NCAR reanalysis and downscaled with the use of a Regional Climate Model. The resulting maps of favourable meteorological conditions were found to be valuable and representative on the country scale but at different correlation ( r) strength against in situ data (from r = 0.84 for thunderstorms to r = 0.15 for fog). A weak correlation between gridded estimates of fog occurrence and observations data indicated the very local nature of this phenomenon. For this reason, additional environmental predictors of fog occurrence were also examined. Topographic parameters derived from the SRTM elevation model and reclassified CORINE Land Cover data were used as the external, explanatory variables for the multiple linear regression kriging used to obtain the final map. The regression model explained 89 % of annual frequency of fog variability in the study area. Regression residuals were interpolated via simple kriging.
NASA Astrophysics Data System (ADS)
He, Anhua; Singh, Ramesh P.; Sun, Zhaohua; Ye, Qing; Zhao, Gang
2016-07-01
The earth tide, atmospheric pressure, precipitation and earthquake fluctuations, especially earthquake greatly impacts water well levels, thus anomalous co-seismic changes in ground water levels have been observed. In this paper, we have used four different models, simple linear regression (SLR), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) to compute the atmospheric pressure and earth tidal effects on water level. Furthermore, we have used the Akaike information criterion (AIC) to study the performance of various models. Based on the lowest AIC and sum of squares for error values, the best estimate of the effects of atmospheric pressure and earth tide on water level is found using the MLR model. However, MLR model does not provide multicollinearity between inputs, as a result the atmospheric pressure and earth tidal response coefficients fail to reflect the mechanisms associated with the groundwater level fluctuations. On the premise of solving serious multicollinearity of inputs, PLS model shows the minimum AIC value. The atmospheric pressure and earth tidal response coefficients show close response with the observation using PLS model. The atmospheric pressure and the earth tidal response coefficients are found to be sensitive to the stress-strain state using the observed data for the period 1 April-8 June 2008 of Chuan 03# well. The transient enhancement of porosity of rock mass around Chuan 03# well associated with the Wenchuan earthquake (Mw = 7.9 of 12 May 2008) that has taken its original pre-seismic level after 13 days indicates that the co-seismic sharp rise of water well could be induced by static stress change, rather than development of new fractures.
NASA Astrophysics Data System (ADS)
Carisi, Francesca; Domeneghetti, Alessio; Kreibich, Heidi; Schröter, Kai; Castellarin, Attilio
2017-04-01
Flood risk is function of flood hazard and vulnerability, therefore its accurate assessment depends on a reliable quantification of both factors. The scientific literature proposes a number of objective and reliable methods for assessing flood hazard, yet it highlights a limited understanding of the fundamental damage processes. Loss modelling is associated with large uncertainty which is, among other factors, due to a lack of standard procedures; for instance, flood losses are often estimated based on damage models derived in completely different contexts (i.e. different countries or geographical regions) without checking its applicability, or by considering only one explanatory variable (i.e. typically water depth). We consider the Secchia river flood event of January 2014, when a sudden levee-breach caused the inundation of nearly 200 km2 in Northern Italy. In the aftermath of this event, local authorities collected flood loss data, together with additional information on affected private households and industrial activities (e.g. buildings surface and economic value, number of company's employees and others). Based on these data we implemented and compared a quadratic-regression damage function, with water depth as the only explanatory variable, and a multi-variable model that combines multiple regression trees and considers several explanatory variables (i.e. bagging decision trees). Our results show the importance of data collection revealing that (1) a simple quadratic regression damage function based on empirical data from the study area can be significantly more accurate than literature damage-models derived for a different context and (2) multi-variable modelling may outperform the uni-variable approach, yet it is more difficult to develop and apply due to a much higher demand of detailed data.
Spatial prediction of soil texture in region Centre (France) from summary data
NASA Astrophysics Data System (ADS)
Dobarco, Mercedes Roman; Saby, Nicolas; Paroissien, Jean-Baptiste; Orton, Tom G.
2015-04-01
Soil texture is a key controlling factor of important soil functions like water and nutrient holding capacity, retention of pollutants, drainage, soil biodiversity, and C cycling. High resolution soil texture maps enhance our understanding of the spatial distribution of soil properties and provide valuable information for decision making and crop management, environmental protection, and hydrological planning. We predicted the soil texture of agricultural topsoils in the Region Centre (France) combining regression and area-to-point kriging. Soil texture data was collected from the French soil-test database (BDAT), which is populated with soil analysis performed by farmers' demand. To protect the anonymity of the farms the data was treated by commune. In a first step, summary statistics of environmental covariates by commune were used to develop prediction models with Cubist, boosted regression trees, and random forests. In a second step the residuals of each individual observation were summarized by commune and kriged following the method developed by Orton et al. (2012). This approach allowed to include non-linear relationships among covariates and soil texture while accounting for the uncertainty on areal means in the area-to-point kriging step. Independent validation of the models was done using data from the systematic soil monitoring network of French soils. Future work will compare the performance of these models with a non-stationary variance geostatistical model using the most important covariates and summary statistics of texture data. The results will inform on whether the later and statistically more-challenging approach improves significantly texture predictions or whether the more simple area-to-point regression kriging can offer satisfactory results. The application of area-to-point regression kriging at national level using BDAT data has the potential to improve soil texture predictions for agricultural topsoils, especially when combined with existing maps (i.e., model ensemble).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Estimates of Ground Temperature and Atmospheric Moisture from CERES Observations
NASA Technical Reports Server (NTRS)
Wu, Man Li C.; Schubert, Siegfried; Einaudi, Franco (Technical Monitor)
2000-01-01
A method is developed to retrieve surface ground temperature (Tg) and atmospheric moisture using clear sky fluxes (CSF) from CERES-TRMM observations. In general, the clear sky outgoing long-wave radiation (CLR) is sensitive to upper level moisture (q(sub h)) over wet regions and Tg over dry regions The clear sky window flux from 800 to 1200 /cm (RadWn) is sensitive to low level moisture (q(sub j)) and Tg. Combining these two measurements (CLR and RadWn), Tg and q(sub h) can be estimated over land, while q(sub h) and q(sub t) can be estimated over the oceans. The approach capitalizes on the availability of satellite estimates of CLR and RadWn and other auxiliary satellite data. The basic methodology employs off-line forward radiative transfer calculations to generate synthetic CSF data from two different global 4-dimensional data assimilation products. Simple linear regression is used to relate discrepancies in CSF to discrepancies in Tg, q(sub h) and q(sub t). The slopes of the regression lines define sensitivity parameters that can be exploited to help interpret mismatches between satellite observations and model-based estimates of CSF. For illustration, we analyze the discrepancies in the CSF between an early implementation of the Goddard Earth Observing System Data Assimilation System (GEOS-DAS) and a recent operational version of the European Center for Medium-Range Weather Prediction data assimilation system. In particular, our analysis of synthetic total and window region SCF differences (computed from two different assimilated data sets) shows that simple linear regression employing (Delta)Tg and broad layer (Delta)q(sub l) from 500 hPa to surface and (Delta)q(sub h) from 200 to 500 hPa provides a good approximation to the full radiative transfer calculations, typically explaining more than 90% of the 6-hourly variance in the flux differences. These simple regression relations can be inverted to "retrieve" the errors in the geophysical parameters. Uncertainties (normalized by standard deviation) in the monthly mean retrieved parameters range from 7% for (Delta)T to about 20% for (Delta)q(sub t). Our initial application of the methodology employed an early CERES-TRMM data set (CLR and Radwn) to assess the quality of the GEOS2 data. The results showed that over the tropical and subtropical oceans GEOS2 is, in general, too wet in the upper troposphere (mean bias of 0.99 mm) and too dry in the lower troposphere (mean bias of -4.7 mm). We note that these errors, as well as a cold bias in the Tg, have largely been corrected in the current version of GEOS-2 with the introduction of a land surface model, a moist turbulence scheme and the assimilation of SSTM/I total precipitable water.
NASA Astrophysics Data System (ADS)
Li, Y.; Rong, Z.
2017-12-01
The surface Bidirectional Reflectance Distribution Function (BRDF) is a key parameter that affects the vicarious calibration accuracy of visible channel remote sensing instrument. In the past 30 years, many studies have been made and a variety of models have been established. Among them, the Ross-li model was highly approved and widely used. Unfortunately, the model doesn't suitable for desert and Gobi quite well because of the scattering kernel it contained, needs the factors such as plant height and plant spacing. A new BRDF model for surface without vegetation, which is mainly used in remote sensing vicarious calibration, is established. That was called Equivalent Mirror Plane (EMP) BRDF. It is used to characterize the bidirectional reflectance of the near Lambertian surface. The accuracy of the EMP BRDF model is validated by the directional reflectance data measured on the Dunhuang Gobi and compared to the Ross-li model. Results show that the regression accuracy of the new model is 0.828, which is similar to the Ross-li model (0.825). Because of the simple form (contains only four polynomials) and simple principle (derived by the Fresnel reflection principle, don't include any vegetation parameters), it is more suitable for near Lambertian surface, such as Gobi, desert, Lunar and reference panel. Results also showed that the new model could also maintain a high accuracy and stability in sparse observation, which is very important for the retrieval requirements of daily updating BRDF remote sensing products.
[Using fractional polynomials to estimate the safety threshold of fluoride in drinking water].
Pan, Shenling; An, Wei; Li, Hongyan; Yang, Min
2014-01-01
To study the dose-response relationship between fluoride content in drinking water and prevalence of dental fluorosis on the national scale, then to determine the safety threshold of fluoride in drinking water. Meta-regression analysis was applied to the 2001-2002 national endemic fluorosis survey data of key wards. First, fractional polynomial (FP) was adopted to establish fixed effect model, determining the best FP structure, after that restricted maximum likelihood (REML) was adopted to estimate between-study variance, then the best random effect model was established. The best FP structure was first-order logarithmic transformation. Based on the best random effect model, the benchmark dose (BMD) of fluoride in drinking water and its lower limit (BMDL) was calculated as 0.98 mg/L and 0.78 mg/L. Fluoride in drinking water can only explain 35.8% of the variability of the prevalence, among other influencing factors, ward type was a significant factor, while temperature condition and altitude were not. Fractional polynomial-based meta-regression method is simple, practical and can provide good fitting effect, based on it, the safety threshold of fluoride in drinking water of our country is determined as 0.8 mg/L.
Alves, R S; Teodoro, P E; Farias, F C; Farias, F J C; Carvalho, L P; Rodrigues, J I S; Bhering, L L; Resende, M D V
2017-08-17
Cotton produces one of the most important textile fibers of the world and has great relevance in the world economy. It is an economically important crop in Brazil, which is the world's fifth largest producer. However, studies evaluating the genotype x environment (G x E) interactions in cotton are scarce in this country. Therefore, the goal of this study was to evaluate the G x E interactions in two important traits in cotton (fiber yield and fiber length) using the method proposed by Eberhart and Russell (simple linear regression) and reaction norm models (random regression). Eight trials with sixteen upland cotton genotypes, conducted in a randomized block design, were used. It was possible to identify a genotype with wide adaptability and stability for both traits. Reaction norm models have excellent theoretical and practical properties and led to more informative and accurate results than the method proposed by Eberhart and Russell and should, therefore, be preferred. Curves of genotypic values as a function of the environmental gradient, which predict the behavior of the genotypes along the environmental gradient, were generated. These curves make possible the recommendation to untested environmental levels.
Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh
2016-11-01
This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.
Modeling gene expression measurement error: a quasi-likelihood approach
Strimmer, Korbinian
2003-01-01
Background Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale). Results Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression. PMID:12659637
A simple method for the extraction and identification of light density microplastics from soil.
Zhang, Shaoliang; Yang, Xiaomei; Gertsen, Hennie; Peters, Piet; Salánki, Tamás; Geissen, Violette
2018-03-01
This article introduces a simple and cost-saving method developed to extract, distinguish and quantify light density microplastics of polyethylene (PE) and polypropylene (PP) in soil. A floatation method using distilled water was used to extract the light density microplastics from soil samples. Microplastics and impurities were identified using a heating method (3-5s at 130°C). The number and size of particles were determined using a camera (Leica DFC 425) connected to a microscope (Leica wild M3C, Type S, simple light, 6.4×). Quantification of the microplastics was conducted using a developed model. Results showed that the floatation method was effective in extracting microplastics from soils, with recovery rates of approximately 90%. After being exposed to heat, the microplastics in the soil samples melted and were transformed into circular transparent particles while other impurities, such as organic matter and silicates were not changed by the heat. Regression analysis of microplastics weight and particle volume (a calculation based on image J software analysis) after heating showed the best fit (y=1.14x+0.46, R 2 =99%, p<0.001). Recovery rates based on the empirical model method were >80%. Results from field samples collected from North-western China prove that our method of repetitive floatation and heating can be used to extract, distinguish and quantify light density polyethylene microplastics in soils. Microplastics mass can be evaluated using the empirical model. Copyright © 2017 Elsevier B.V. All rights reserved.
Wiley, Jeffrey B.; Atkins, John T.; Newell, Dawn A.
2002-01-01
Multiple and simple least-squares regression models for the log10-transformed 1.5- and 2-year recurrence intervals of peak discharges with independent variables describing the basin characteristics (log10-transformed and untransformed) for 236 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions in West Virginia designated as East, North, and South. Regional equations for the 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2.0-, 2.5-, and 3-year recurrence intervals of peak discharges were determined by generalized least-squares regression. Log10-transformed drainage area was the most significant independent variable for all regions. Equations developed in this study are applicable only to rural, unregulated streams within the boundaries of West Virginia. The accuracies of estimating equations are quantified by measuring the average prediction error (from 27.4 to 52.4 percent) and equivalent years of record (from 1.1 to 3.4 years).
Anodic microbial community diversity as a predictor of the power output of microbial fuel cells.
Stratford, James P; Beecroft, Nelli J; Slade, Robert C T; Grüning, André; Avignone-Rossa, Claudio
2014-03-01
The relationship between the diversity of mixed-species microbial consortia and their electrogenic potential in the anodes of microbial fuel cells was examined using different diversity measures as predictors. Identical microbial fuel cells were sampled at multiple time-points. Biofilm and suspension communities were analysed by denaturing gradient gel electrophoresis to calculate the number and relative abundance of species. Shannon and Simpson indices and richness were examined for association with power using bivariate and multiple linear regression, with biofilm DNA as an additional variable. In simple bivariate regressions, the correlation of Shannon diversity of the biofilm and power is stronger (r=0.65, p=0.001) than between power and richness (r=0.39, p=0.076), or between power and the Simpson index (r=0.5, p=0.018). Using Shannon diversity and biofilm DNA as predictors of power, a regression model can be constructed (r=0.73, p<0.001). Ecological parameters such as the Shannon index are predictive of the electrogenic potential of microbial communities. Copyright © 2014 Elsevier Ltd. All rights reserved.
Quantitation & Case-Study-Driven Inquiry to Enhance Yeast Fermentation Studies
ERIC Educational Resources Information Center
Grammer, Robert T.
2012-01-01
We propose a procedure for the assay of fermentation in yeast in microcentrifuge tubes that is simple and rapid, permitting assay replicates, descriptive statistics, and the preparation of line graphs that indicate reproducibility. Using regression and simple derivatives to determine initial velocities, we suggest methods to compare the effects of…
Validation of the Simple View of Reading in Hebrew--A Semitic Language
ERIC Educational Resources Information Center
Joshi, R. Malatesha; Ji, Xuejun Ryan; Breznitz, Zvia; Amiel, Meirav; Yulia, Astri
2015-01-01
The Simple View of Reading (SVR) in Hebrew was tested by administering decoding, listening comprehension, and reading comprehension measures to 1,002 students from Grades 2 to 10 in the northern part of Israel. Results from hierarchical regression analyses supported the SVR in Hebrew with decoding and listening comprehension measures explaining…
NASA Technical Reports Server (NTRS)
Dome, G. J.; Fung, A. K.; Moore, R. K.
1977-01-01
Several regression models were tested to explain the wind direction dependence of the 1975 JONSWAP (Joint North Sea Wave Project) scatterometer data. The models consider the radar backscatter as a harmonic function of wind direction. The constant term accounts for the major effect of wind speed and the sinusoidal terms for the effects of direction. The fundamental accounts for the difference in upwind and downwind returns, while the second harmonic explains the upwind-crosswind difference. It is shown that a second harmonic model appears to adequately explain the angular variation. A simple inversion technique, which uses two orthogonal scattering measurements, is also described which eliminates the effect of wind speed and direction. Vertical polarization was shown to be more effective in determining both wind speed and direction than horizontal polarization.
Evaluation of the Williams-type model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The Williams-type yield model is based on multiple regression analysis of historial time series data at CRD level pooled to regional level (groups of similar CRDs). Basic variables considered in the analysis include USDA yield, monthly mean temperature, monthly precipitation, soil texture and topographic information, and variables derived from these. Technologic trend is represented by piecewise linear and/or quadratic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-1979) demonstrate that biases are small and performance based on root mean square appears to be acceptable for the intended AgRISTARS large area applications. The model is objective, adequate, timely, simple, and not costly. It consideres scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
Berger, Lawrence M; Bruch, Sarah K; Johnson, Elizabeth I; James, Sigrid; Rubin, David
2009-01-01
This study used data on 2,453 children aged 4-17 from the National Survey of Child and Adolescent Well-Being and 5 analytic methods that adjust for selection factors to estimate the impact of out-of-home placement on children's cognitive skills and behavior problems. Methods included ordinary least squares (OLS) regressions and residualized change, simple change, difference-in-difference, and fixed effects models. Models were estimated using the full sample and a matched sample generated by propensity scoring. Although results from the unmatched OLS and residualized change models suggested that out-of-home placement is associated with increased child behavior problems, estimates from models that more rigorously adjust for selection bias indicated that placement has little effect on children's cognitive skills or behavior problems.
Inflammation, homocysteine and carotid intima-media thickness.
Baptista, Alexandre P; Cacdocar, Sanjiva; Palmeiro, Hugo; Faísca, Marília; Carrasqueira, Herménio; Morgado, Elsa; Sampaio, Sandra; Cabrita, Ana; Silva, Ana Paula; Bernardo, Idalécio; Gome, Veloso; Neves, Pedro L
2008-01-01
Cardiovascular disease is the main cause of morbidity and mortality in chronic renal patients. Carotid intima-media thickness (CIMT) is one of the most accurate markers of atherosclerosis risk. In this study, the authors set out to evaluate a population of chronic renal patients to determine which factors are associated with an increase in intima-media thickness. We included 56 patients (F=22, M=34), with a mean age of 68.6 years, and an estimated glomerular filtration rate of 15.8 ml/min (calculated by the MDRD equation). Various laboratory and inflammatory parameters (hsCRP, IL-6 and TNF-alpha) were evaluated. All subjects underwent measurement of internal carotid artery intima-media thickness by high-resolution real-time B-mode ultrasonography using a 10 MHz linear transducer. Intima-media thickness was used as a dependent variable in a simple linear regression model, with the various laboratory parameters as independent variables. Only parameters showing a significant correlation with CIMT were evaluated in a multiple regression model: age (p=0.001), hemoglobin (p=00.3), logCRP (p=0.042), logIL-6 (p=0.004) and homocysteine (p=0.002). In the multiple regression model we found that age (p=0.001) and homocysteine (p=0.027) were independently correlated with CIMT. LogIL-6 did not reach statistical significance (p=0.057), probably due to the small population size. The authors conclude that age and homocysteine correlate with carotid intima-media thickness, and thus can be considered as markers/risk factors in chronic renal patients.
Cloud-Free Satellite Image Mosaics with Regression Trees and Histogram Matching.
E.H. Helmer; B. Ruefenacht
2005-01-01
Cloud-free optical satellite imagery simplifies remote sensing, but land-cover phenology limits existing solutions to persistent cloudiness to compositing temporally resolute, spatially coarser imagery. Here, a new strategy for developing cloud-free imagery at finer resolution permits simple automatic change detection. The strategy uses regression trees to predict...
A Diagrammatic Exposition of Regression and Instrumental Variables for the Beginning Student
ERIC Educational Resources Information Center
Foster, Gigi
2009-01-01
Some beginning students of statistics and econometrics have difficulty with traditional algebraic approaches to explaining regression and related techniques. For these students, a simple and intuitive diagrammatic introduction as advocated by Kennedy (2008) may prove a useful framework to support further study. The author presents a series of…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Fitting program for linear regressions according to Mahon (1996)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trappitsch, Reto G.
2018-01-09
This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
No Evidence of Reaction Time Slowing in Autism Spectrum Disorder
ERIC Educational Resources Information Center
Ferraro, F. Richard
2016-01-01
A total of 32 studies comprising 238 simple reaction time and choice reaction time conditions were examined in individuals with autism spectrum disorder (n?=?964) and controls (n?=?1032). A Brinley plot/multiple regression analysis was performed on mean reaction times, regressing autism spectrum disorder performance onto the control performance as…
Silva, Fabyano Fonseca; Tunin, Karen P.; Rosa, Guilherme J.M.; da Silva, Marcos V.B.; Azevedo, Ana Luisa Souza; da Silva Verneque, Rui; Machado, Marco Antonio; Packer, Irineu Umberto
2011-01-01
Now a days, an important and interesting alternative in the control of tick-infestation in cattle is to select resistant animals, and identify the respective quantitative trait loci (QTLs) and DNA markers, for posterior use in breeding programs. The number of ticks/animal is characterized as a discrete-counting trait, which could potentially follow Poisson distribution. However, in the case of an excess of zeros, due to the occurrence of several noninfected animals, zero-inflated Poisson and generalized zero-inflated distribution (GZIP) may provide a better description of the data. Thus, the objective here was to compare through simulation, Poisson and ZIP models (simple and generalized) with classical approaches, for QTL mapping with counting phenotypes under different scenarios, and to apply these approaches to a QTL study of tick resistance in an F2 cattle (Gyr × Holstein) population. It was concluded that, when working with zero-inflated data, it is recommendable to use the generalized and simple ZIP model for analysis. On the other hand, when working with data with zeros, but not zero-inflated, the Poisson model or a data-transformation-approach, such as square-root or Box-Cox transformation, are applicable. PMID:22215960
Zivkovic, Milena Z; Djuric, Sasa; Cuk, Ivan; Suzovic, Dejan; Jaric, Slobodan
2017-07-01
A range of force (F) and velocity (V) data obtained from functional movement tasks (e.g., running, jumping, throwing, lifting, cycling) performed under variety of external loads have typically revealed strong and approximately linear F-V relationships. The regression model parameters reveal the maximum F (F-intercept), V (V-intercept), and power (P) producing capacities of the tested muscles. The aim of the present study was to evaluate the level of agreement between the routinely used "multiple-load model" and a simple "two-load model" based on direct assessment of the F-V relationship from only 2 external loads applied. Twelve participants were tested on the maximum performance vertical jumps, cycling, bench press throws, and bench pull performed against a variety of different loads. All 4 tested tasks revealed both exceptionally strong relationships between the parameters of the 2 models (median R = 0.98) and a lack of meaningful differences between their magnitudes (fixed bias below 3.4%). Therefore, addition of another load to the standard tests of various functional tasks typically conducted under a single set of mechanical conditions could allow for the assessment of the muscle mechanical properties such as the muscle F, V, and P producing capacities.
Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin
2014-01-01
Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block.
Homophily and Contagion Are Generically Confounded in Observational Social Network Studies
Shalizi, Cosma Rohilla; Thomas, Andrew C.
2012-01-01
The authors consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual’s covariates on his or her behavior or other measurable responses. The authors show that generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular the authors demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual’s enduring traits and his or her choices, even when there is no intrinsic affinity between them. The authors also suggest some possible constructive responses to these results. PMID:22523436
Zhu, Shanyou; Zhang, Hailong; Liu, Ronggao; Cao, Yun; Zhang, Guixin
2014-01-01
Sampling designs are commonly used to estimate deforestation over large areas, but comparisons between different sampling strategies are required. Using PRODES deforestation data as a reference, deforestation in the state of Mato Grosso in Brazil from 2005 to 2006 is evaluated using Landsat imagery and a nearly synchronous MODIS dataset. The MODIS-derived deforestation is used to assist in sampling and extrapolation. Three sampling designs are compared according to the estimated deforestation of the entire study area based on simple extrapolation and linear regression models. The results show that stratified sampling for strata construction and sample allocation using the MODIS-derived deforestation hotspots provided more precise estimations than simple random and systematic sampling. Moreover, the relationship between the MODIS-derived and TM-derived deforestation provides a precise estimate of the total deforestation area as well as the distribution of deforestation in each block. PMID:25258742
van Rhee, Henk; Hak, Tony
2017-01-01
We present a new tool for meta‐analysis, Meta‐Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta‐analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta‐Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta‐analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp‐Hartung adjustment of the DerSimonian‐Laird estimator. However, more advanced meta‐analysis methods such as meta‐analytical structural equation modelling and meta‐regression with multiple covariates are not available. In summary, Meta‐Essentials may prove a valuable resource for meta‐analysts, including researchers, teachers, and students. PMID:28801932
Habitat or matrix: which is more relevant to predict road-kill of vertebrates?
Bueno, C; Sousa, C O M; Freitas, S R
2015-11-01
We believe that in tropics we need a community approach to evaluate road impacts on wildlife, and thus, suggest mitigation measures for groups of species instead a focal-species approach. Understanding which landscape characteristics indicate road-kill events may also provide models that can be applied in other regions. We intend to evaluate if habitat or matrix is more relevant to predict road-kill events for a group of species. Our hypothesis is: more permeable matrix is the most relevant factor to explain road-kill events. To test this hypothesis, we chose vertebrates as the studied assemblage and a highway crossing in an Atlantic Forest region in southeastern Brazil as the study site. Logistic regression models were designed using presence/absence of road-kill events as dependent variables and landscape characteristics as independent variables, which were selected by Akaike's Information Criterion. We considered a set of candidate models containing four types of simple regression models: Habitat effect model; Matrix types effect models; Highway effect model; and, Reference models (intercept and buffer distance). Almost three hundred road-kills and 70 species were recorded. River proximity and herbaceous vegetation cover, both matrix effect models, were associated to most road-killed vertebrate groups. Matrix was more relevant than habitat to predict road-kill of vertebrates. The association between river proximity and road-kill indicates that rivers may be a preferential route for most species. We discuss multi-species mitigation measures and implications to movement ecology and conservation strategies.
Nelson, Jon P
2014-01-01
Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.
A sampling study on rock properties affecting drilling rate index (DRI)
NASA Astrophysics Data System (ADS)
Yenice, Hayati; Özdoğan, Mehmet V.; Özfırat, M. Kemal
2018-05-01
Drilling rate index (DRI) developed in Norway is a very useful index in determining the drillability of rocks and even in performance prediction of hard rock TBMs and it requires special laboratory test equipment. Drillability is one of the most important subjects in rock excavation. However, determining drillability index from physical and mechanical properties of rocks is very important for practicing engineers such as underground excavation, drilling operations in open pit mining, underground mining and natural stone production. That is why many researchers have studied concerned with drillability to find the correlations between drilling rate index (DRI) and penetration rate, influence of geological properties on drillability prediction in tunneling, correlations between rock properties and drillability. In this study, the relationships between drilling rate index (DRI) and some physico-mechanical properties (Density, Shore hardness, uniaxial compressive strength (UCS, σc), Indirect tensile strength (ITS, σt)) of three different rock groups including magmatic, sedimentary and metamorphic were evaluated using both simple and multiple regression analysis. This study reveals the effects of rock properties on DRI according to different types of rocks. In simple regression, quite high correlations were found between DRI and uniaxial compressive strength (UCS) and also between DRI and indirect tensile strength (ITS) values. Multiple regression analyses revealed even higher correlations when compared to simple regression. Especially, UCS, ITS, Shore hardness (SH) and the interactions between them were found to be very effective on DRI values.
Homogenization Issues in the Combustion of Heterogeneous Solid Propellants
NASA Technical Reports Server (NTRS)
Chen, M.; Buckmaster, J.; Jackson, T. L.; Massa, L.
2002-01-01
We examine random packs of discs or spheres, models for ammonium-perchlorate-in-binder propellants, and discuss their average properties. An analytical strategy is described for calculating the mean or effective heat conduction coefficient in terms of the heat conduction coefficients of the individual components, and the results are verified by comparison with those of direct numerical simulations (dns) for both 2-D (disc) and 3-D (sphere) packs across which a temperature difference is applied. Similarly, when the surface regression speed of each component is related to the surface temperature via a simple Arrhenius law, an analytical strategy is developed for calculating an effective Arrhenius law for the combination, and these results are verified using dns in which a uniform heat flux is applied to the pack surface, causing it to regress. These results are needed for homogenization strategies necessary for fully integrated 2-D or 3-D simulations of heterogeneous propellant combustion.
Buri, Luigi; Hassan, Cesare; Bersani, Gianluca; Anti, Marcello; Bianco, Maria Antonietta; Cipolletta, Livio; Di Giulio, Emilio; Di Matteo, Giovanni; Familiari, Luigi; Ficano, Leonardo; Loriga, Pietro; Morini, Sergio; Pietropaolo, Vincenzo; Zambelli, Alessandro; Grossi, Enzo; Intraligi, Marco; Buscema, Massimo
2010-06-01
Selecting patients appropriately for upper endoscopy (EGD) is crucial for efficient use of endoscopy. The objective of this study was to compare different clinical strategies and statistical methods to select patients for EGD, namely appropriateness guidelines, age and/or alarm features, and multivariate and artificial neural network (ANN) models. A nationwide, multicenter, prospective study was undertaken in which consecutive patients referred for EGD during a 1-month period were enrolled. Before EGD, the endoscopist assessed referral appropriateness according to the American Society for Gastrointestinal Endoscopy (ASGE) guidelines, also collecting clinical and demographic variables. Outcomes of the study were detection of relevant findings and new diagnosis of malignancy at EGD. The accuracy of the following clinical strategies and predictive rules was compared: (i) ASGE appropriateness guidelines (indicated vs. not indicated), (ii) simplified rule (>or=45 years or alarm features vs. <45 years without alarm features), (iii) logistic regression model, and (iv) ANN models. A total of 8,252 patients were enrolled in 57 centers. Overall, 3,803 (46%) relevant findings and 132 (1.6%) new malignancies were detected. Sensitivity, specificity, and area under the receiver-operating characteristic curve (AUC) of the simplified rule were similar to that of the ASGE guidelines for both relevant findings (82%/26%/0.55 vs. 88%/27%/0.52) and cancer (97%/22%/0.58 vs. 98%/20%/0.58). Both logistic regression and ANN models seemed to be substantially more accurate in predicting new cases of malignancy, with an AUC of 0.82 and 0.87, respectively. A simple predictive rule based on age and alarm features is similarly effective to the more complex ASGE guidelines in selecting patients for EGD. Regression and ANN models may be useful in identifying a relatively small subgroup of patients at higher risk of cancer.
Aspects of porosity prediction using multivariate linear regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Byrnes, A.P.; Wilson, M.D.
1991-03-01
Highly accurate multiple linear regression models have been developed for sandstones of diverse compositions. Porosity reduction or enhancement processes are controlled by the fundamental variables, Pressure (P), Temperature (T), Time (t), and Composition (X), where composition includes mineralogy, size, sorting, fluid composition, etc. The multiple linear regression equation, of which all linear porosity prediction models are subsets, takes the generalized form: Porosity = C{sub 0} + C{sub 1}(P) + C{sub 2}(T) + C{sub 3}(X) + C{sub 4}(t) + C{sub 5}(PT) + C{sub 6}(PX) + C{sub 7}(Pt) + C{sub 8}(TX) + C{sub 9}(Tt) + C{sub 10}(Xt) + C{sub 11}(PTX) + C{submore » 12}(PXt) + C{sub 13}(PTt) + C{sub 14}(TXt) + C{sub 15}(PTXt). The first four primary variables are often interactive, thus requiring terms involving two or more primary variables (the form shown implies interaction and not necessarily multiplication). The final terms used may also involve simple mathematic transforms such as log X, e{sup T}, X{sup 2}, or more complex transformations such as the Time-Temperature Index (TTI). The X term in the equation above represents a suite of compositional variable and, therefore, a fully expanded equation may include a series of terms incorporating these variables. Numerous published bivariate porosity prediction models involving P (or depth) or Tt (TTI) are effective to a degree, largely because of the high degree of colinearity between p and TTI. However, all such bivariate models ignore the unique contributions of P and Tt, as well as various X terms. These simpler models become poor predictors in regions where colinear relations change, were important variables have been ignored, or where the database does not include a sufficient range or weight distribution for the critical variables.« less
NASA Astrophysics Data System (ADS)
Baumgartner, D. J.; Pötzi, W.; Freislich, H.; Strutzmann, H.; Veronig, A. M.; Foelsche, U.; Rieder, H. E.
2017-06-01
In recent decades, automated sensors for sunshine duration (SD) measurements have been introduced in meteorological networks, thereby replacing traditional instruments, most prominently the Campbell-Stokes (CS) sunshine recorder. Parallel records of automated and traditional SD recording systems are rare. Nevertheless, such records are important to understand the differences/similarities in SD totals obtained with different instruments and how changes in monitoring device type affect the homogeneity of SD records. This study investigates the differences/similarities in parallel SD records obtained with a CS and two automated SD sensors between 2007 and 2016 at the Kanzelhöhe Observatory, Austria. Comparing individual records of daily SD totals, we find differences of both positive and negative sign, with smallest differences between the automated sensors. The larger differences between CS-derived SD totals and those from automated sensors can be attributed (largely) to the higher sensitivity threshold of the CS instrument. Correspondingly, the closest agreement among all sensors is found during summer, the time of year when sensitivity thresholds are least critical. Furthermore, we investigate the performance of various models to create the so-called sensor-type-equivalent (STE) SD records. Our analysis shows that regression models including all available data on daily (or monthly) time scale perform better than simple three- (or four-) point regression models. Despite general good performance, none of the considered regression models (of linear or quadratic form) emerges as the "optimal" model. Although STEs prove useful for relating SD records of individual sensors on daily/monthly time scales, this does not ensure that STE (or joint) records can be used for trend analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, Shih-Miao; Hwang, Ho-Ling
2007-01-01
This paper describes a development of national freight demand models for 27 industry sectors covered by the 2002 Commodity Flow Survey. It postulates that the national freight demands are consistent with U.S. business patterns. Furthermore, the study hypothesizes that the flow of goods, which make up the national production processes of industries, is coherent with the information described in the 2002 Annual Input-Output Accounts developed by the Bureau of Economic Analysis. The model estimation framework hinges largely on the assumption that a relatively simple relationship exists between freight production/consumption and business patterns for each industry defined by the three-digit Northmore » American Industry Classification System industry codes (NAICS). The national freight demand model for each selected industry sector consists of two models; a freight generation model and a freight attraction model. Thus, a total of 54 simple regression models were estimated under this study. Preliminary results indicated promising freight generation and freight attraction models. Among all models, only four of them had a R2 value lower than 0.70. With additional modeling efforts, these freight demand models could be enhanced to allow transportation analysts to assess regional economic impacts associated with temporary lost of transportation services on U.S. transportation network infrastructures. Using such freight demand models and available U.S. business forecasts, future national freight demands could be forecasted within certain degrees of accuracy. These freight demand models could also enable transportation analysts to further disaggregate the CFS state-level origin-destination tables to county or zip code level.« less
On-line Model Structure Selection for Estimation of Plasma Boundary in a Tokamak
NASA Astrophysics Data System (ADS)
Škvára, Vít; Šmídl, Václav; Urban, Jakub
2015-11-01
Control of the plasma field in the tokamak requires reliable estimation of the plasma boundary. The plasma boundary is given by a complex mathematical model and the only available measurements are responses of induction coils around the plasma. For the purpose of boundary estimation the model can be reduced to simple linear regression with potentially infinitely many elements. The number of elements must be selected manually and this choice significantly influences the resulting shape. In this paper, we investigate the use of formal model structure estimation techniques for the problem. Specifically, we formulate a sparse least squares estimator using the automatic relevance principle. The resulting algorithm is a repetitive evaluation of the least squares problem which could be computed in real time. Performance of the resulting algorithm is illustrated on simulated data and evaluated with respect to a more detailed and computationally costly model FREEBIE.
Kashuba, Roxolana; Cha, YoonKyung; Alameddine, Ibrahim; Lee, Boknam; Cuffney, Thomas F.
2010-01-01
Multilevel hierarchical modeling methodology has been developed for use in ecological data analysis. The effect of urbanization on stream macroinvertebrate communities was measured across a gradient of basins in each of nine metropolitan regions across the conterminous United States. The hierarchical nature of this dataset was harnessed in a multi-tiered model structure, predicting both invertebrate response at the basin scale and differences in invertebrate response at the region scale. Ordination site scores, total taxa richness, Ephemeroptera, Plecoptera, Trichoptera (EPT) taxa richness, and richness-weighted mean tolerance of organisms at a site were used to describe invertebrate responses. Percentage of urban land cover was used as a basin-level predictor variable. Regional mean precipitation, air temperature, and antecedent agriculture were used as region-level predictor variables. Multilevel hierarchical models were fit to both levels of data simultaneously, borrowing statistical strength from the complete dataset to reduce uncertainty in regional coefficient estimates. Additionally, whereas non-hierarchical regressions were only able to show differing relations between invertebrate responses and urban intensity separately for each region, the multilevel hierarchical regressions were able to explain and quantify those differences within a single model. In this way, this modeling approach directly establishes the importance of antecedent agricultural conditions in masking the response of invertebrates to urbanization in metropolitan regions such as Milwaukee-Green Bay, Wisconsin; Denver, Colorado; and Dallas-Fort Worth, Texas. Also, these models show that regions with high precipitation, such as Atlanta, Georgia; Birmingham, Alabama; and Portland, Oregon, start out with better regional background conditions of invertebrates prior to urbanization but experience faster negative rates of change with urbanization. Ultimately, this urbanization-invertebrate response example is used to detail the multilevel hierarchical construction methodology, showing how the result is a set of models that are both statistically more rigorous and ecologically more interpretable than simple linear regression models.
Comparison of CEAS and Williams-type models for spring wheat yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1982-01-01
The CEAS and Williams-type yield models are both based on multiple regression analysis of historical time series data at CRD level. The CEAS model develops a separate relation for each CRD; the Williams-type model pools CRD data to regional level (groups of similar CRDs). Basic variables considered in the analyses are USDA yield, monthly mean temperature, monthly precipitation, and variables derived from these. The Williams-type model also used soil texture and topographic information. Technological trend is represented in both by piecewise linear functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test of each model (1970-1979) demonstrate that the models are very similar in performance in all respects. Both models are about equally objective, adequate, timely, simple, and inexpensive. Both consider scientific knowledge on a broad scale but not in detail. Neither provides a good current measure of modeled yield reliability. The CEAS model is considered very slightly preferable for AgRISTARS applications.
Chicken barn climate and hazardous volatile compounds control using simple linear regression and PID
NASA Astrophysics Data System (ADS)
Abdullah, A. H.; Bakar, M. A. A.; Shukor, S. A. A.; Saad, F. S. A.; Kamis, M. S.; Mustafa, M. H.; Khalid, N. S.
2016-07-01
The hazardous volatile compounds from chicken manure in chicken barn are potentially to be a health threat to the farm animals and workers. Ammonia (NH3) and hydrogen sulphide (H2S) produced in chicken barn are influenced by climate changes. The Electronic Nose (e-nose) is used for the barn's air, temperature and humidity data sampling. Simple Linear Regression is used to identify the correlation between temperature-humidity, humidity-ammonia and ammonia-hydrogen sulphide. MATLAB Simulink software was used for the sample data analysis using PID controller. Results shows that the performance of PID controller using the Ziegler-Nichols technique can improve the system controller to control climate in chicken barn.
NASA Astrophysics Data System (ADS)
Fouad, Geoffrey; Skupin, André; Hope, Allen
2016-04-01
The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions. These results are largely reflective of cross-correlation existing in hydrologic datasets, and highlight the limited predictive power of many traditionally used variables for regional regression. A parsimonious approach including fewer variables chosen based on their connection to streamflow may be more efficient than a data mining approach including many different variables. Future regional regression studies may benefit from having a hydrologic rationale for including different variables and attempting to create new variables related to streamflow.
A Simple Microsoft Excel Method to Predict Antibiotic Outbreaks and Underutilization.
Miglis, Cristina; Rhodes, Nathaniel J; Avedissian, Sean N; Zembower, Teresa R; Postelnick, Michael; Wunderink, Richard G; Sutton, Sarah H; Scheetz, Marc H
2017-07-01
Benchmarking strategies are needed to promote the appropriate use of antibiotics. We have adapted a simple regressive method in Microsoft Excel that is easily implementable and creates predictive indices. This method trends consumption over time and can identify periods of over- and underuse at the hospital level. Infect Control Hosp Epidemiol 2017;38:860-862.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Di Legge, A; Testa, A C; Ameye, L; Van Calster, B; Lissoni, A A; Leone, F P G; Savelli, L; Franchi, D; Czekierdowski, A; Trio, D; Van Holsbeke, C; Ferrazzi, E; Scambia, G; Timmerman, D; Valentin, L
2012-09-01
To estimate the ability to discriminate between benign and malignant adnexal masses of different size using: subjective assessment, two International Ovarian Tumor Analysis (IOTA) logistic regression models (LR1 and LR2), the IOTA simple rules and the risk of malignancy index (RMI). We used a multicenter IOTA database of 2445 patients with at least one adnexal mass, i.e. the database previously used to prospectively validate the diagnostic performance of LR1 and LR2. The masses were categorized into three subgroups according to their largest diameter: small tumors (diameter < 4 cm; n = 396), medium-sized tumors (diameter, 4-9.9 cm; n = 1457) and large tumors (diameter ≥ 10 cm, n = 592). Subjective assessment, LR1 and LR2, IOTA simple rules and the RMI were applied to each of the three groups. Sensitivity, specificity, positive and negative likelihood ratio (LR+, LR-), diagnostic odds ratio (DOR) and area under the receiver-operating characteristics curve (AUC) were used to describe diagnostic performance. A moving window technique was applied to estimate the effect of tumor size as a continuous variable on the AUC. The reference standard was the histological diagnosis of the surgically removed adnexal mass. The frequency of invasive malignancy was 10% in small tumors, 19% in medium-sized tumors and 40% in large tumors; 11% of the large tumors were borderline tumors vs 3% and 4%, respectively, of the small and medium-sized tumors. The type of benign histology also differed among the three subgroups. For all methods, sensitivity with regard to malignancy was lowest in small tumors (56-84% vs 67-93% in medium-sized tumors and 74-95% in large tumors) while specificity was lowest in large tumors (60-87%vs 83-95% in medium-sized tumors and 83-96% in small tumors ). The DOR and the AUC value were highest in medium-sized tumors and the AUC was largest in tumors with a largest diameter of 7-11 cm. Tumor size affects the performance of subjective assessment, LR1 and LR2, the IOTA simple rules and the RMI in discriminating correctly between benign and malignant adnexal masses. The likely explanation, at least in part, is the difference in histology among tumors of different size. Copyright © 2012 ISUOG. Published by John Wiley & Sons, Ltd.
Empirical Modeling of Plant Gas Fluxes in Controlled Environments
NASA Technical Reports Server (NTRS)
Cornett, Jessie David
1994-01-01
As humans extend their reach beyond the earth, bioregenerative life support systems must replace the resupply and physical/chemical systems now used. The Controlled Ecological Life Support System (CELSS) will utilize plants to recycle the carbon dioxide (CO2) and excrement produced by humans and return oxygen (O2), purified water and food. CELSS design requires knowledge of gas flux levels for net photosynthesis (PS(sub n)), dark respiration (R(sub d)) and evapotranspiration (ET). Full season gas flux data regarding these processes for wheat (Triticum aestivum), soybean (Glycine max) and rice (Oryza sativa) from published sources were used to develop empirical models. Univariate models relating crop age (days after planting) and gas flux were fit by simple regression. Models are either high order (5th to 8th) or more complex polynomials whose curves describe crop development characteristics. The models provide good estimates of gas flux maxima, but are of limited utility. To broaden the applicability, data were transformed to dimensionless or correlation formats and, again, fit by regression. Polynomials, similar to those in the initial effort, were selected as the most appropriate models. These models indicate that, within a cultivar, gas flux patterns appear remarkably similar prior to maximum flux, but exhibit considerable variation beyond this point. This suggests that more broadly applicable models of plant gas flux are feasible, but univariate models defining gas flux as a function of crop age are too simplistic. Multivariate models using CO2 and crop age were fit for PS(sub n), and R(sub d) by multiple regression. In each case, the selected model is a subset of a full third order model with all possible interactions. These models are improvements over the univariate models because they incorporate more than the single factor, crop age, as the primary variable governing gas flux. They are still limited, however, by their reliance on the other environmental conditions under which the original data were collected. Three-dimensional plots representing the response surface of each model are included. Suitability of using empirical models to generate engineering design estimates is discussed. Recommendations for the use of more complex multivariate models to increase versatility are included.
Terahertz spectral detection of potassium sorbate in milk powder
NASA Astrophysics Data System (ADS)
Li, Pengpeng; Zhang, Yuan; Ge, Hongyi
2017-02-01
The spectral characteristics of potassium sorbate in milk powder in the range of 0.2 2.0 THz have been measured with THz time-domain spectroscopy(THz-TDS). Its absorption and refraction spectra are obtained at room temperature in the nitrogen atmosphere. The results showed that potassium sorbate at 0.98 THz obvious characteristic absorption peak. The simple linear regression(SLR) model was taken to analyze the content of potassium sorbate in milk powder. The results showed that the absorption coefficient increases as the mixture potassium sorbate increases. The research is important to food quality and safety testing.
Alishiri, Gholam Hossein; Bayat, Noushin; Fathi Ashtiani, Ali; Tavallaii, Seyed Abbas; Assari, Shervin; Moharamzad, Yashar
2008-01-01
The aim of this work was to develop two logistic regression models capable of predicting physical and mental health related quality of life (HRQOL) among rheumatoid arthritis (RA) patients. In this cross-sectional study which was conducted during 2006 in the outpatient rheumatology clinic of our university hospital, Short Form 36 (SF-36) was used for HRQOL measurements in 411 RA patients. A cutoff point to define poor versus good HRQOL was calculated using the first quartiles of SF-36 physical and mental component scores (33.4 and 36.8, respectively). Two distinct logistic regression models were used to derive predictive variables including demographic, clinical, and psychological factors. The sensitivity, specificity, and accuracy of each model were calculated. Poor physical HRQOL was positively associated with pain score, disease duration, monthly family income below 300 US$, comorbidity, patient global assessment of disease activity or PGA, and depression (odds ratios: 1.1; 1.004; 15.5; 1.1; 1.02; 2.08, respectively). The variables that entered into the poor mental HRQOL prediction model were monthly family income below 300 US$, comorbidity, PGA, and bodily pain (odds ratios: 6.7; 1.1; 1.01; 1.01, respectively). Optimal sensitivity and specificity were achieved at a cutoff point of 0.39 for the estimated probability of poor physical HRQOL and 0.18 for mental HRQOL. Sensitivity, specificity, and accuracy of the physical and mental models were 73.8, 87, 83.7% and 90.38, 70.36, 75.43%, respectively. The results show that the suggested models can be used to predict poor physical and mental HRQOL separately among RA patients using simple variables with acceptable accuracy. These models can be of use in the clinical decision-making of RA patients and to recognize patients with poor physical or mental HRQOL in advance, for better management.
Hearing loss screening tool (COBRA score) for newborns in primary care setting
Poonual, Watcharapol; Navacharoen, Niramon; Kangsanarak, Jaran; Namwongprom, Sirianong
2017-01-01
Purpose To develop and evaluate a simple screening tool to assess hearing loss in newborns. A derived score was compared with the standard clinical practice tool. Methods This cohort study was designed to screen the hearing of newborns using transiently evoked otoacoustic emission and auditory brain stem response, and to determine the risk factors associated with hearing loss of newborns in 3 tertiary hospitals in Northern Thailand. Data were prospectively collected from November 1, 2010 to May 31, 2012. To develop the risk score, clinical-risk indicators were measured by Poisson risk regression. The regression coefficients were transformed into item scores dividing each regression-coefficient with the smallest coefficient in the model, rounding the number to its nearest integer, and adding up to a total score. Results Five clinical risk factors (Craniofacial anomaly, Ototoxicity, Birth weight, family history [Relative] of congenital sensorineural hearing loss, and Apgar score) were included in our COBRA score. The screening tool detected, by area under the receiver operating characteristic curve, more than 80% of existing hearing loss. The positive-likelihood ratio of hearing loss in patients with scores of 4, 6, and 8 were 25.21 (95% confidence interval [CI], 14.69–43.26), 58.52 (95% CI, 36.26–94.44), and 51.56 (95% CI, 33.74–78.82), respectively. This result was similar to the standard tool (The Joint Committee on Infant Hearing) of 26.72 (95% CI, 20.59–34.66). Conclusion A simple screening tool of five predictors provides good prediction indices for newborn hearing loss, which may motivate parents to bring children for further appropriate testing and investigations. PMID:29234358
Mandel, Micha; Gauthier, Susan A; Guttmann, Charles R G; Weiner, Howard L; Betensky, Rebecca A
2007-12-01
The expanded disability status scale (EDSS) is an ordinal score that measures progression in multiple sclerosis (MS). Progression is defined as reaching EDSS of a certain level (absolute progression) or increasing of one point of EDSS (relative progression). Survival methods for time to progression are not adequate for such data since they do not exploit the EDSS level at the end of follow-up. Instead, we suggest a Markov transitional model applicable for repeated categorical or ordinal data. This approach enables derivation of covariate-specific survival curves, obtained after estimation of the regression coefficients and manipulations of the resulting transition matrix. Large sample theory and resampling methods are employed to derive pointwise confidence intervals, which perform well in simulation. Methods for generating survival curves for time to EDSS of a certain level, time to increase of EDSS of at least one point, and time to two consecutive visits with EDSS greater than three are described explicitly. The regression models described are easily implemented using standard software packages. Survival curves are obtained from the regression results using packages that support simple matrix calculation. We present and demonstrate our method on data collected at the Partners MS center in Boston, MA. We apply our approach to progression defined by time to two consecutive visits with EDSS greater than three, and calculate crude (without covariates) and covariate-specific curves.
NASA Astrophysics Data System (ADS)
ul-Haq, Zia; Rana, Asim Daud; Tariq, Salman; Mahmood, Khalid; Ali, Muhammad; Bashir, Iqra
2018-03-01
We have applied regression analyses for the modeling of tropospheric NO2 (tropo-NO2) as the function of anthropogenic nitrogen oxides (NOx) emissions, aerosol optical depth (AOD), and some important meteorological parameters such as temperature (Temp), precipitation (Preci), relative humidity (RH), wind speed (WS), cloud fraction (CLF) and outgoing long-wave radiation (OLR) over different climatic zones and land use/land cover types in South Asia during October 2004-December 2015. Simple linear regression shows that, over South Asia, tropo-NO2 variability is significantly linked to AOD, WS, NOx, Preci and CLF. Also zone-5, consisting of tropical monsoon areas of eastern India and Myanmar, is the only study zone over which all the selected parameters show their influence on tropo-NO2 at statistical significance levels. In stepwise multiple linear modeling, tropo-NO2 column over landmass of South Asia, is significantly predicted by the combination of RH (standardized regression coefficient, β = - 49), AOD (β = 0.42) and NOx (β = 0.25). The leading predictors of tropo-NO2 columns over zones 1-5 are OLR, AOD, Temp, OLR, and RH respectively. Overall, as revealed by the higher correlation coefficients (r), the multiple regressions provide reasonable models for tropo-NO2 over South Asia (r = 0.82), zone-4 (r = 0.90) and zone-5 (r = 0.93). The lowest r (of 0.66) has been found for hot semi-arid region in northwestern Indus-Ganges Basin (zone-2). The highest value of β for urban area AOD (of 0.42) is observed for megacity Lahore, located in warm semi-arid zone-2 with large scale crop-residue burning, indicating strong influence of aerosols on the modeled tropo-NO2 column. A statistical significant correlation (r = 0.22) at the 0.05 level is found between tropo-NO2 and AOD over Lahore. Also NOx emissions appear as the highest contributor (β = 0.59) for modeled tropo-NO2 column over megacity Dhaka.
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Kahya, Ercan
2017-06-01
Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.
NASA Technical Reports Server (NTRS)
Ohring, G.
1972-01-01
Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
On identifying relationships between the flood scaling exponent and basin attributes.
Medhi, Hemanta; Tripathi, Shivam
2015-07-01
Floods are known to exhibit self-similarity and follow scaling laws that form the basis of regional flood frequency analysis. However, the relationship between basin attributes and the scaling behavior of floods is still not fully understood. Identifying these relationships is essential for drawing connections between hydrological processes in a basin and the flood response of the basin. The existing studies mostly rely on simulation models to draw these connections. This paper proposes a new methodology that draws connections between basin attributes and the flood scaling exponents by using observed data. In the proposed methodology, region-of-influence approach is used to delineate homogeneous regions for each gaging station. Ordinary least squares regression is then applied to estimate flood scaling exponents for each homogeneous region, and finally stepwise regression is used to identify basin attributes that affect flood scaling exponents. The effectiveness of the proposed methodology is tested by applying it to data from river basins in the United States. The results suggest that flood scaling exponent is small for regions having (i) large abstractions from precipitation in the form of large soil moisture storages and high evapotranspiration losses, and (ii) large fractions of overland flow compared to base flow, i.e., regions having fast-responding basins. Analysis of simple scaling and multiscaling of floods showed evidence of simple scaling for regions in which the snowfall dominates the total precipitation.
Fukushima, Makoto; Saunders, Richard C; Fujii, Naotaka; Averbeck, Bruno B; Mishkin, Mortimer
2014-01-01
Vocal production is an example of controlled motor behavior with high temporal precision. Previous studies have decoded auditory evoked cortical activity while monkeys listened to vocalization sounds. On the other hand, there have been few attempts at decoding motor cortical activity during vocal production. Here we recorded cortical activity during vocal production in the macaque with a chronically implanted electrocorticographic (ECoG) electrode array. The array detected robust activity in motor cortex during vocal production. We used a nonlinear dynamical model of the vocal organ to reduce the dimensionality of `Coo' calls produced by the monkey. We then used linear regression to evaluate the information in motor cortical activity for this reduced representation of calls. This simple linear model accounted for circa 65% of the variance in the reduced sound representations, supporting the feasibility of using the dynamical model of the vocal organ for decoding motor cortical activity during vocal production.
NASA Astrophysics Data System (ADS)
Kumavat, Hemraj Ramdas
2016-09-01
The compressive stress-strain behavior and mechanical properties of clay brick masonry and its constituents clay bricks and mortar, have been studied by several laboratory tests. Using linear regression analysis, a analytical model has been proposed for obtaining the stress-strain curves for masonry that can be used in the analysis and design procedures. The model requires only the compressive strengths of bricks and mortar as input data, which can be easily obtained experimentally. Development of analytical model from the obtained experimental results of Young's modulus and compressive strength. Simple relationships have been identified for obtaining the modulus of elasticity of bricks, mortar, and masonry from their corresponding compressive strengths. It was observed that the proposed analytical model clearly demonstrates a reasonably good prediction of the stress-strain curves when compared with the experimental curves.
A Driving Behaviour Model of Electrical Wheelchair Users
Hamam, Y.; Djouani, K.; Daachi, B.; Steyn, N.
2016-01-01
In spite of the presence of powered wheelchairs, some of the users still experience steering challenges and manoeuvring difficulties that limit their capacity of navigating effectively. For such users, steering support and assistive systems may be very necessary. To appreciate the assistance, there is need that the assistive control is adaptable to the user's steering behaviour. This paper contributes to wheelchair steering improvement by modelling the steering behaviour of powered wheelchair users, for integration into the control system. More precisely, the modelling is based on the improved Directed Potential Field (DPF) method for trajectory planning. The method has facilitated the formulation of a simple behaviour model that is also linear in parameters. To obtain the steering data for parameter identification, seven individuals participated in driving the wheelchair in different virtual worlds on the augmented platform. The obtained data facilitated the estimation of user parameters, using the ordinary least square method, with satisfactory regression analysis results. PMID:27148362
The Fringe-Imaging Skin Friction Technique PC Application User's Manual
NASA Technical Reports Server (NTRS)
Zilliac, Gregory G.
1999-01-01
A personal computer application (CXWIN4G) has been written which greatly simplifies the task of extracting skin friction measurements from interferograms of oil flows on the surface of wind tunnel models. Images are first calibrated, using a novel approach to one-camera photogrammetry, to obtain accurate spatial information on surfaces with curvature. As part of the image calibration process, an auxiliary file containing the wind tunnel model geometry is used in conjunction with a two-dimensional direct linear transformation to relate the image plane to the physical (model) coordinates. The application then applies a nonlinear regression model to accurately determine the fringe spacing from interferometric intensity records as required by the Fringe Imaging Skin Friction (FISF) technique. The skin friction is found through application of a simple expression that makes use of lubrication theory to relate fringe spacing to skin friction.
Negotiating Multicollinearity with Spike-and-Slab Priors.
Ročková, Veronika; George, Edward I
2014-08-01
In multiple regression under the normal linear model, the presence of multicollinearity is well known to lead to unreliable and unstable maximum likelihood estimates. This can be particularly troublesome for the problem of variable selection where it becomes more difficult to distinguish between subset models. Here we show how adding a spike-and-slab prior mitigates this difficulty by filtering the likelihood surface into a posterior distribution that allocates the relevant likelihood information to each of the subset model modes. For identification of promising high posterior models in this setting, we consider three EM algorithms, the fast closed form EMVS version of Rockova and George (2014) and two new versions designed for variants of the spike-and-slab formulation. For a multimodal posterior under multicollinearity, we compare the regions of convergence of these three algorithms. Deterministic annealing versions of the EMVS algorithm are seen to substantially mitigate this multimodality. A single simple running example is used for illustration throughout.
cp-R, an interface the R programming language for clinical laboratory method comparisons.
Holmes, Daniel T
2015-02-01
Clinical scientists frequently need to compare two different bioanalytical methods as part of assay validation/monitoring. As a matter necessity, regression methods for quantitative comparison in clinical chemistry, hematology and other clinical laboratory disciplines must allow for error in both the x and y variables. Traditionally the methods popularized by 1) Deming and 2) Passing and Bablok have been recommended. While commercial tools exist, no simple open source tool is available. The purpose of this work was to develop and entirely open-source GUI-driven program for bioanalytical method comparisons capable of performing these regression methods and able to produce highly customized graphical output. The GUI is written in python and PyQt4 with R scripts performing regression and graphical functions. The program can be run from source code or as a pre-compiled binary executable. The software performs three forms of regression and offers weighting where applicable. Confidence bands of the regression are calculated using bootstrapping for Deming and Passing Bablok methods. Users can customize regression plots according to the tools available in R and can produced output in any of: jpg, png, tiff, bmp at any desired resolution or ps and pdf vector formats. Bland Altman plots and some regression diagnostic plots are also generated. Correctness of regression parameter estimates was confirmed against existing R packages. The program allows for rapid and highly customizable graphical output capable of conforming to the publication requirements of any clinical chemistry journal. Quick method comparisons can also be performed and cut and paste into spreadsheet or word processing applications. We present a simple and intuitive open source tool for quantitative method comparison in a clinical laboratory environment. Copyright © 2014 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Edwards, Rufus D; Smith, Kirk R; Zhang, Junfeng; Ma, Yuqing
2003-01-01
Residential energy use in developing countries has traditionally been associated with combustion devices of poor energy efficiency, which have been shown to produce substantial health-damaging pollution, contributing significantly to the global burden of disease, and greenhouse gas (GHG) emissions. Precision of these estimates in China has been hampered by limited data on stove use and fuel consumption in residences. In addition limited information is available on variability of emissions of pollutants from different stove/fuel combinations in typical use, as measurement of emission factors requires measurement of multiple chemical species in complex burn cycle tests. Such measurements are too costly and time consuming for application in conjunction with national surveys. Emissions of most of the major health-damaging pollutants (HDP) and many of the gases that contribute to GHG emissions from cooking stoves are the result of the significant portion of fuel carbon that is diverted to products of incomplete combustion (PIC) as a result of poor combustion efficiencies. The approximately linear increase in emissions of PIC with decreasing combustion efficiencies allows development of linear models to predict emissions of GHG and HDP intrinsically linked to CO2 and PIC production, and ultimately allows the prediction of global warming contributions from residential stove emissions. A comprehensive emissions database of three burn cycles of 23 typical fuel/stove combinations tested in a simulated village house in China has been used to develop models to predict emissions of HDP and global warming commitment (GWC) from cooking stoves in China, that rely on simple survey information on stove and fuel use that may be incorporated into national surveys. Stepwise regression models predicted 66% of the variance in global warming commitment (CO2, CO, CH4, NOx, TNMHC) per 1 MJ delivered energy due to emissions from these stoves if survey information on fuel type was available. Subsequently if stove type is known, stepwise regression models predicted 73% of the variance. Integrated assessment of policies to change stove or fuel type requires that implications for environmental impacts, energy efficiency, global warming and human exposures to HDP emissions can be evaluated. Frequently, this involves measurement of TSP or CO as the major HDPs. Incorporation of this information into models to predict GWC predicted 79% and 78% of the variance respectively. Clearly, however, the complexity of making multiple measurements in conjunction with a national survey would be both expensive and time consuming. Thus, models to predict HDP using simple survey information, and with measurement of either CO/CO2 or TSP/CO2 to predict emission factors for the other HDP have been derived. Stepwise regression models predicted 65% of the variance in emissions of total suspended particulate as grams of carbon (TSPC) per 1 MJ delivered if survey information on fuel and stove type was available and 74% if the CO/CO2 ratio was measured. Similarly stepwise regression models predicted 76% of the variance in COC emissions per MJ delivered with survey information on stove and fuel type and 85% if the TSPC/CO2 ratio was measured. Ultimately, with international agreements on emissions trading frameworks, similar models based on extensive databases of the fate of fuel carbon during combustion from representative household stoves would provide a mechanism for computing greenhouse credits in the residential sector as part of clean development mechanism frameworks and monitoring compliance to control regimes.
Mapping and spatial-temporal modeling of Bromus tectorum invasion in central Utah
NASA Astrophysics Data System (ADS)
Jin, Zhenyu
Cheatgrass, or Downy Brome, is an exotic winter annual weed native to the Mediterranean region. Since its introduction to the U.S., it has become a significant weed and aggressive invader of sagebrush, pinion-juniper, and other shrub communities, where it can completely out-compete native grasses and shrubs. In this research, remotely sensed data combined with field collected data are used to investigate the distribution of the cheatgrass in Central Utah, to characterize the trend of the NDVI time-series of cheatgrass, and to construct a spatially explicit population-based model to simulate the spatial-temporal dynamics of the cheatgrass. This research proposes a method for mapping the canopy closure of invasive species using remotely sensed data acquired at different dates. Different invasive species have their own distinguished phenologies and the satellite images in different dates could be used to capture the phenology. The results of cheatgrass abundance prediction have a good fit with the field data for both linear regression and regression tree models, although the regression tree model has better performance than the linear regression model. To characterize the trend of NDVI time-series of cheatgrass, a novel smoothing algorithm named RMMEH is presented in this research to overcome some drawbacks of many other algorithms. By comparing the performance of RMMEH in smoothing a 16-day composite of the MODIS NDVI time-series with that of two other methods, which are the 4253EH, twice and the MVI, we have found that RMMEH not only keeps the original valid NDVI points, but also effectively removes the spurious spikes. The reconstructed NDVI time-series of different land covers are of higher quality and have smoother temporal trend. To simulate the spatial-temporal dynamics of cheatgrass, a spatially explicit population-based model is built applying remotely sensed data. The comparison between the model output and the ground truth of cheatgrass closure demonstrates that the model could successfully simulate the spatial-temporal dynamics of cheatgrass in a simple cheatgrass-dominant environment. The simulation of the functional response of different prescribed fire rates also shows that this model is helpful to answer management questions like, "What are the effects of prescribed fire to invasive species?" It demonstrates that a medium fire rate of 10% can successfully prevent cheatgrass invasion.
Model-based Bayesian inference for ROC data analysis
NASA Astrophysics Data System (ADS)
Lei, Tianhu; Bae, K. Ty
2013-03-01
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
NASA Astrophysics Data System (ADS)
Prasanna, V.
2018-01-01
This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.
Simplified estimation of age-specific reference intervals for skewed data.
Wright, E M; Royston, P
1997-12-30
Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.
Prediction on carbon dioxide emissions based on fuzzy rules
NASA Astrophysics Data System (ADS)
Pauzi, Herrini; Abdullah, Lazim
2014-06-01
There are several ways to predict air quality, varying from simple regression to models based on artificial intelligence. Most of the conventional methods are not sufficiently able to provide good forecasting performances due to the problems with non-linearity uncertainty and complexity of the data. Artificial intelligence techniques are successfully used in modeling air quality in order to cope with the problems. This paper describes fuzzy inference system (FIS) to predict CO2 emissions in Malaysia. Furthermore, adaptive neuro-fuzzy inference system (ANFIS) is used to compare the prediction performance. Data of five variables: energy use, gross domestic product per capita, population density, combustible renewable and waste and CO2 intensity are employed in this comparative study. The results from the two model proposed are compared and it is clearly shown that the ANFIS outperforms FIS in CO2 prediction.
Modeling soil parameters using hyperspectral image reflectance in subtropical coastal wetlands
NASA Astrophysics Data System (ADS)
Anne, Naveen J. P.; Abd-Elrahman, Amr H.; Lewis, David B.; Hewitt, Nicole A.
2014-12-01
Developing spectral models of soil properties is an important frontier in remote sensing and soil science. Several studies have focused on modeling soil properties such as total pools of soil organic matter and carbon in bare soils. We extended this effort to model soil parameters in areas densely covered with coastal vegetation. Moreover, we investigated soil properties indicative of soil functions such as nutrient and organic matter turnover and storage. These properties include the partitioning of mineral and organic soil between particulate (>53 μm) and fine size classes, and the partitioning of soil carbon and nitrogen pools between stable and labile fractions. Soil samples were obtained from Avicennia germinans mangrove forest and Juncus roemerianus salt marsh plots on the west coast of central Florida. Spectra corresponding to field plot locations from Hyperion hyperspectral image were extracted and analyzed. The spectral information was regressed against the soil variables to determine the best single bands and optimal band combinations for the simple ratio (SR) and normalized difference index (NDI) indices. The regression analysis yielded levels of correlation for soil variables with R2 values ranging from 0.21 to 0.47 for best individual bands, 0.28 to 0.81 for two-band indices, and 0.53 to 0.96 for partial least-squares (PLS) regressions for the Hyperion image data. Spectral models using Hyperion data adequately (RPD > 1.4) predicted particulate organic matter (POM), silt + clay, labile carbon (C), and labile nitrogen (N) (where RPD = ratio of standard deviation to root mean square error of cross-validation [RMSECV]). The SR (0.53 μm, 2.11 μm) model of labile N with R2 = 0.81, RMSECV= 0.28, and RPD = 1.94 produced the best results in this study. Our results provide optimism that remote-sensing spectral models can successfully predict soil properties indicative of ecosystem nutrient and organic matter turnover and storage, and do so in areas with dense canopy cover.
NASA Astrophysics Data System (ADS)
Jing, Ran; Gong, Zhaoning; Zhao, Wenji; Pu, Ruiliang; Deng, Lei
2017-12-01
Above-bottom biomass (ABB) is considered as an important parameter for measuring the growth status of aquatic plants, and is of great significance for assessing health status of wetland ecosystems. In this study, Structure from Motion (SfM) technique was used to rebuild the study area with high overlapped images acquired by an unmanned aerial vehicle (UAV). We generated orthoimages and SfM dense point cloud data, from which vegetation indices (VIs) and SfM point cloud variables including average height (HAVG), standard deviation of height (HSD) and coefficient of variation of height (HCV) were extracted. These VIs and SfM point cloud variables could effectively characterize the growth status of aquatic plants, and thus they could be used to develop a simple linear regression model (SLR) and a stepwise linear regression model (SWL) with field measured ABB samples of aquatic plants. We also utilized a decision tree method to discriminate different types of aquatic plants. The experimental results indicated that (1) the SfM technique could effectively process high overlapped UAV images and thus be suitable for the reconstruction of fine texture feature of aquatic plant canopy structure; and (2) an SWL model based on point cloud variables: HAVG, HSD, HCV and two VIs: NGRDI, ExGR as independent variables has produced the best predictive result of ABB of aquatic plants in the study area, with a coefficient of determination of 0.84 and a relative root mean square error of 7.13%. In this analysis, a novel method for the quantitative inversion of a growth parameter (i.e., ABB) of aquatic plants in wetlands was demonstrated.
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A
2013-05-01
This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Quantifying prosthetic gait deviation using simple outcome measures
Kark, Lauren; Odell, Ross; McIntosh, Andrew S; Simmons, Anne
2016-01-01
AIM: To develop a subset of simple outcome measures to quantify prosthetic gait deviation without needing three-dimensional gait analysis (3DGA). METHODS: Eight unilateral, transfemoral amputees and 12 unilateral, transtibial amputees were recruited. Twenty-eight able-bodied controls were recruited. All participants underwent 3DGA, the timed-up-and-go test and the six-minute walk test (6MWT). The lower-limb amputees also completed the Prosthesis Evaluation Questionnaire. Results from 3DGA were summarised using the gait deviation index (GDI), which was subsequently regressed, using stepwise regression, against the other measures. RESULTS: Step-length (SL), self-selected walking speed (SSWS) and the distance walked during the 6MWT (6MWD) were significantly correlated with GDI. The 6MWD was the strongest, single predictor of the GDI, followed by SL and SSWS. The predictive ability of the regression equations were improved following inclusion of self-report data related to mobility and prosthetic utility. CONCLUSION: This study offers a practicable alternative to quantifying kinematic deviation without the need to conduct complete 3DGA. PMID:27335814
Forecasting outbreaks of the Douglas-fir tussock moth from lower crown cocoon samples.
Richard R. Mason; Donald W. Scott; H. Gene Paul
1993-01-01
A predictive technique using a simple linear regression was developed to forecast the midcrown density of small tussock moth larvae from estimates of cocoon density in the previous generation. The regression estimator was derived from field samples of cocoons and larvae taken from a wide range of nonoutbreak tussock moth populations. The accuracy of the predictions was...
Liu, Zun-Lei; Yuan, Xing-Wei; Yan, Li-Ping; Yang, Lin-Lin; Cheng, Jia-Hua
2013-09-01
By using the 2008-2010 investigation data about the body condition of small yellow croaker in the offshore waters of southern Yellow Sea (SYS), open waters of northern East China Sea (NECS), and offshore waters of middle East China Sea (MECS), this paper analyzed the spatial heterogeneity of body length-body mass of juvenile and adult small yellow croakers by the statistical approaches of mean regression model and quantile regression model. The results showed that the residual standard errors from the analysis of covariance (ANCOVA) and the linear mixed-effects model were similar, and those from the simple linear regression were the highest. For the juvenile small yellow croakers, their mean body mass in SYS and NECS estimated by the mixed-effects mean regression model was higher than the overall average mass across the three regions, while the mean body mass in MECS was below the overall average. For the adult small yellow croakers, their mean body mass in NECS was higher than the overall average, while the mean body mass in SYS and MECS was below the overall average. The results from quantile regression indicated the substantial differences in the allometric relationships of juvenile small yellow croakers between SYS, NECS, and MECS, with the estimated mean exponent of the allometric relationship in SYS being 2.85, and the interquartile range being from 2.63 to 2.96, which indicated the heterogeneity of body form. The results from ANCOVA showed that the allometric body length-body mass relationships were significantly different between the 25th and 75th percentile exponent values (F=6.38, df=1737, P<0.01) and the 25th percentile and median exponent values (F=2.35, df=1737, P=0.039). The relationship was marginally different between the median and 75th percentile exponent values (F=2.21, df=1737, P=0.051). The estimated body length-body mass exponent of adult small yellow croakers in SYS was 3.01 (10th and 95th percentiles = 2.77 and 3.1, respectively). The estimated body length-body mass relationships were significantly different from the lower and upper quantiles of the exponent (F=3.31, df=2793, P=0.01) and the median and upper quantiles (F=3.56, df=2793, P<0.01), while no significant difference was observed between the lower and median quantiles (F=0.98, df=2793, P=0.43).
NASA Technical Reports Server (NTRS)
Caldwell, E. C.; Cowley, M. S.; Scott-Pandorf, M. M.
2010-01-01
Develop a model that simulates a human running in 0 G using the European Space Agency s (ESA) Subject Loading System (SLS). The model provides ground reaction forces (GRF) based on speed and pull-down forces (PDF). DESIGN The theoretical basis for the Running Model was based on a simple spring-mass model. The dynamic properties of the spring-mass model express theoretical vertical GRF (GRFv) and shear GRF in the posterior-anterior direction (GRFsh) during running gait. ADAMs VIEW software was used to build the model, which has a pelvis, thigh segment, shank segment, and a spring foot (see Figure 1).the model s movement simulates the joint kinematics of a human running at Earth gravity with the aim of generating GRF data. DEVELOPMENT & VERIFICATION ESA provided parabolic flight data of subjects running while using the SLS, for further characterization of the model s GRF. Peak GRF data were fit to a linear regression line dependent on PDF and speed. Interpolation and extrapolation of the regression equation provided a theoretical data matrix, which is used to drive the model s motion equations. Verification of the model was conducted by running the model at 4 different speeds, with each speed accounting for 3 different PDF. The model s GRF data fell within a 1-standard-deviation boundary derived from the empirical ESA data. CONCLUSION The Running Model aids in conducting various simulations (potential scenarios include a fatigued runner or a powerful runner generating high loads at a fast cadence) to determine limitations for the T2 vibration isolation system (VIS) aboard the International Space Station. This model can predict how running with the ESA SLS affects the T2 VIS and may be used for other exercise analyses in the future.
Health belief model and reasoned action theory in predicting water saving behaviors in yazd, iran.
Morowatisharifabad, Mohammad Ali; Momayyezi, Mahdieh; Ghaneian, Mohammad Taghi
2012-01-01
People's behaviors and intentions about healthy behaviors depend on their beliefs, values, and knowledge about the issue. Various models of health education are used in deter¬mining predictors of different healthy behaviors but their efficacy in cultural behaviors, such as water saving behaviors, are not studied. The study was conducted to explain water saving beha¬viors in Yazd, Iran on the basis of Health Belief Model and Reasoned Action Theory. The cross-sectional study used random cluster sampling to recruit 200 heads of households to collect the data. The survey questionnaire was tested for its content validity and reliability. Analysis of data included descriptive statistics, simple correlation, hierarchical multiple regression. Simple correlations between water saving behaviors and Reasoned Action Theory and Health Belief Model constructs were statistically significant. Health Belief Model and Reasoned Action Theory constructs explained 20.80% and 8.40% of the variances in water saving beha-viors, respectively. Perceived barriers were the strongest Predictor. Additionally, there was a sta¬tistically positive correlation between water saving behaviors and intention. In designing interventions aimed at water waste prevention, barriers of water saving behaviors should be addressed first, followed by people's attitude towards water saving. Health Belief Model constructs, with the exception of perceived severity and benefits, is more powerful than is Reasoned Action Theory in predicting water saving behavior and may be used as a framework for educational interventions aimed at improving water saving behaviors.
Health Belief Model and Reasoned Action Theory in Predicting Water Saving Behaviors in Yazd, Iran
Morowatisharifabad, Mohammad Ali; Momayyezi, Mahdieh; Ghaneian, Mohammad Taghi
2012-01-01
Background: People's behaviors and intentions about healthy behaviors depend on their beliefs, values, and knowledge about the issue. Various models of health education are used in deter¬mining predictors of different healthy behaviors but their efficacy in cultural behaviors, such as water saving behaviors, are not studied. The study was conducted to explain water saving beha¬viors in Yazd, Iran on the basis of Health Belief Model and Reasoned Action Theory. Methods: The cross-sectional study used random cluster sampling to recruit 200 heads of households to collect the data. The survey questionnaire was tested for its content validity and reliability. Analysis of data included descriptive statistics, simple correlation, hierarchical multiple regression. Results: Simple correlations between water saving behaviors and Reasoned Action Theory and Health Belief Model constructs were statistically significant. Health Belief Model and Reasoned Action Theory constructs explained 20.80% and 8.40% of the variances in water saving beha-viors, respectively. Perceived barriers were the strongest Predictor. Additionally, there was a sta¬tistically positive correlation between water saving behaviors and intention. Conclusion: In designing interventions aimed at water waste prevention, barriers of water saving behaviors should be addressed first, followed by people's attitude towards water saving. Health Belief Model constructs, with the exception of perceived severity and benefits, is more powerful than is Reasoned Action Theory in predicting water saving behavior and may be used as a framework for educational interventions aimed at improving water saving behaviors. PMID:24688927
Physical function interfering with pain and symptoms in fibromyalgia patients.
Assumpção, A; Sauer, J F; Mango, P C; Pascual Marques, A
2010-01-01
The aim of this study was to assess the relationship between variables of physical assessment - muscular strength, flexibility and dynamic balance - with pain, pain threshold, and fibromyalgia symptoms (FM). Our sample consists of 55 women, with age ranging from 30 to 55 years (mean of 46.5, (standard deviation, SD=6.6)), mean body mass index (BMI) of 28.7 (3.8) and diagnosed for FM according to the American College of Rheumatology criteria. Pain intensity was measured using a visual analogue scale (VAS) and pain threshold (PT) using Fisher's dolorimeter. FM symptoms were assessed by the Fibromyalgia Impact Questionnaire (FIQ); flexibility by the third finger to floor test (3FF); the muscular strength index (MSI) by the maximum volunteer isometric contraction at flexion and extension of right knee and elbow using a force transducer, dynamic balance by the time to get up and go (TUG) test and the functional reach test (FRT). Data were analysed using Pearson's correlation, as well as simple and multivariate regression tests, with significance level of 5%. PT and FIQ were weakly but significantly correlated with the TUG, MSI and 3FF as well as VAS with the TUG and MSI (p<0.05). VAS, PT and FIQ was not correlated with FRT. Simple regression suggests that, alone, TUG, FR, MSI and 3FF are low predictors of VAS, PT and FIQ. For the VAS, the best predictive model includes TUG and MSI, explaining 12.6% of pain variability. For TP and total symptoms, as obtained by the FIQ, most predictive model includes 3FF and MSI, which respectively respond by 30% and 21% of the variability. Muscular strength, flexibility and balance are associated with pain, pain threshold, and symptoms in FM patients.
Schut, Antonius G T; Ivits, Eva; Conijn, Jacob G; Ten Brink, Ben; Fensholt, Rasmus
2015-01-01
Detailed understanding of a possible decoupling between climatic drivers of plant productivity and the response of ecosystems vegetation is required. We compared trends in six NDVI metrics (1982-2010) derived from the GIMMS3g dataset with modelled biomass productivity and assessed uncertainty in trend estimates. Annual total biomass weight (TBW) was calculated with the LINPAC model. Trends were determined using a simple linear regression, a Thiel-Sen medium slope and a piecewise regression (PWR) with two segments. Values of NDVI metrics were related to Net Primary Production (MODIS-NPP) and TBW per biome and land-use type. The simple linear and Thiel-Sen trends did not differ much whereas PWR increased the fraction of explained variation, depending on the NDVI metric considered. A positive trend in TBW indicating more favorable climatic conditions was found for 24% of pixels on land, and for 5% a negative trend. A decoupled trend, indicating positive TBW trends and monotonic negative or segmented and negative NDVI trends, was observed for 17-36% of all productive areas depending on the NDVI metric used. For only 1-2% of all pixels in productive areas, a diverging and greening trend was found despite a strong negative trend in TBW. The choice of NDVI metric used strongly affected outcomes on regional scales and differences in the fraction of explained variation in MODIS-NPP between biomes were large, and a combination of NDVI metrics is recommended for global studies. We have found an increasing difference between trends in climatic drivers and observed NDVI for large parts of the globe. Our findings suggest that future scenarios must consider impacts of constraints on plant growth such as extremes in weather and nutrient availability to predict changes in NPP and CO2 sequestration capacity.
Vyskocil, Erich; Gruther, Wolfgang; Steiner, Irene; Schuhfried, Othmar
2014-07-01
Disease-specific categories of the International Classification of Functioning, Disability and Health have not yet been described for patients with chronic peripheral arterial obstructive disease (PAD). The authors examined the relationship between the categories of the Brief Core Sets for ischemic heart diseases with the Peripheral Artery Questionnaire and the ankle-brachial index to determine which International Classification of Functioning, Disability and Health categories are most relevant for patients with PAD. This is a retrospective cohort study including 77 patients with verified PAD. Statistical analyses of the relationship between International Classification of Functioning, Disability and Health categories as independent variables and the endpoints Peripheral Artery Questionnaire or ankle-brachial index were carried out by simple and stepwise linear regression models adjusting for age, sex, and leg (left vs. right). The stepwise linear regression model with the ankle-brachial index as dependent variable revealed a significant effect of the variables blood vessel functions and muscle endurance functions. Calculating a stepwise linear regression model with the Peripheral Artery Questionnaire as dependent variable, a significant effect of age, emotional functions, energy and drive functions, carrying out daily routine, as well as walking could be observed. This study identifies International Classification of Functioning, Disability and Health categories in the Brief Core Sets for ischemic heart diseases that show a significant effect on the ankle-brachial index and the Peripheral Artery Questionnaire score in patients with PAD. These categories provide fundamental information on functioning of patients with PAD and patient-centered outcomes for rehabilitation interventions.
Estimating population trends with a linear model: Technical comments
Sauer, John R.; Link, William A.; Royle, J. Andrew
2004-01-01
Controversy has sometimes arisen over whether there is a need to accommodate the limitations of survey design in estimating population change from the count data collected in bird surveys. Analyses of surveys such as the North American Breeding Bird Survey (BBS) can be quite complex; it is natural to ask if the complexity is necessary, or whether the statisticians have run amok. Bart et al. (2003) propose a very simple analysis involving nothing more complicated than simple linear regression, and contrast their approach with model-based procedures. We review the assumptions implicit to their proposed method, and document that these assumptions are unlikely to be valid for surveys such as the BBS. One fundamental limitation of a purely design-based approach is the absence of controls for factors that influence detection of birds at survey sites. We show that failure to model observer effects in survey data leads to substantial bias in estimation of population trends from BBS data for the 20 species that Bart et al. (2003) used as the basis of their simulations. Finally, we note that the simulations presented in Bart et al. (2003) do not provide a useful evaluation of their proposed method, nor do they provide a valid comparison to the estimating- equations alternative they consider.
The Plumbing of Land Surface Models: Is Poor Performance a Result of Methodology or Data Quality?
NASA Technical Reports Server (NTRS)
Haughton, Ned; Abramowitz, Gab; Pitman, Andy J.; Or, Dani; Best, Martin J.; Johnson, Helen R.; Balsamo, Gianpaolo; Boone, Aaron; Cuntz, Matthais; Decharme, Bertrand;
2016-01-01
The PALS Land sUrface Model Benchmarking Evaluation pRoject (PLUMBER) illustrated the value of prescribing a priori performance targets in model intercomparisons. It showed that the performance of turbulent energy flux predictions from different land surface models, at a broad range of flux tower sites using common evaluation metrics, was on average worse than relatively simple empirical models. For sensible heat fluxes, all land surface models were outperformed by a linear regression against downward shortwave radiation. For latent heat flux, all land surface models were outperformed by a regression against downward shortwave, surface air temperature and relative humidity. These results are explored here in greater detail and possible causes are investigated. We examine whether particular metrics or sites unduly influence the collated results, whether results change according to time-scale aggregation and whether a lack of energy conservation in fluxtower data gives the empirical models an unfair advantage in the intercomparison. We demonstrate that energy conservation in the observational data is not responsible for these results. We also show that the partitioning between sensible and latent heat fluxes in LSMs, rather than the calculation of available energy, is the cause of the original findings. Finally, we present evidence suggesting that the nature of this partitioning problem is likely shared among all contributing LSMs. While we do not find a single candidate explanation forwhy land surface models perform poorly relative to empirical benchmarks in PLUMBER, we do exclude multiple possible explanations and provide guidance on where future research should focus.
Heuristics as Bayesian inference under extreme priors.
Parpart, Paula; Jones, Matt; Love, Bradley C
2018-05-01
Simple heuristics are often regarded as tractable decision strategies because they ignore a great deal of information in the input data. One puzzle is why heuristics can outperform full-information models, such as linear regression, which make full use of the available information. These "less-is-more" effects, in which a relatively simpler model outperforms a more complex model, are prevalent throughout cognitive science, and are frequently argued to demonstrate an inherent advantage of simplifying computation or ignoring information. In contrast, we show at the computational level (where algorithmic restrictions are set aside) that it is never optimal to discard information. Through a formal Bayesian analysis, we prove that popular heuristics, such as tallying and take-the-best, are formally equivalent to Bayesian inference under the limit of infinitely strong priors. Varying the strength of the prior yields a continuum of Bayesian models with the heuristics at one end and ordinary regression at the other. Critically, intermediate models perform better across all our simulations, suggesting that down-weighting information with the appropriate prior is preferable to entirely ignoring it. Rather than because of their simplicity, our analyses suggest heuristics perform well because they implement strong priors that approximate the actual structure of the environment. We end by considering how new heuristics could be derived by infinitely strengthening the priors of other Bayesian models. These formal results have implications for work in psychology, machine learning and economics. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Lee, Cameron C; Sheridan, Scott C
2018-07-01
Temperature-mortality relationships are nonlinear, time-lagged, and can vary depending on the time of year and geographic location, all of which limits the applicability of simple regression models in describing these associations. This research demonstrates the utility of an alternative method for modeling such complex relationships that has gained recent traction in other environmental fields: nonlinear autoregressive models with exogenous input (NARX models). All-cause mortality data and multiple temperature-based data sets were gathered from 41 different US cities, for the period 1975-2010, and subjected to ensemble NARX modeling. Models generally performed better in larger cities and during the winter season. Across the US, median absolute percentage errors were 10% (ranging from 4% to 15% in various cities), the average improvement in the r-squared over that of a simple persistence model was 17% (6-24%), and the hit rate for modeling spike days in mortality (>80th percentile) was 54% (34-71%). Mortality responded acutely to hot summer days, peaking at 0-2 days of lag before dropping precipitously, and there was an extended mortality response to cold winter days, peaking at 2-4 days of lag and dropping slowly and continuing for multiple weeks. Spring and autumn showed both of the aforementioned temperature-mortality relationships, but generally to a lesser magnitude than what was seen in summer or winter. When compared to distributed lag nonlinear models, NARX model output was nearly identical. These results highlight the applicability of NARX models for use in modeling complex and time-dependent relationships for various applications in epidemiology and environmental sciences. Copyright © 2018 Elsevier Inc. All rights reserved.
Two-Stage Residual Inclusion Estimation in Health Services Research and Health Economics.
Terza, Joseph V
2018-06-01
Empirical analyses in health services research and health economics often require implementation of nonlinear models whose regressors include one or more endogenous variables-regressors that are correlated with the unobserved random component of the model. In such cases, implementation of conventional regression methods that ignore endogeneity will likely produce results that are biased and not causally interpretable. Terza et al. (2008) discuss a relatively simple estimation method that avoids endogeneity bias and is applicable in a wide variety of nonlinear regression contexts. They call this method two-stage residual inclusion (2SRI). In the present paper, I offer a 2SRI how-to guide for practitioners and a step-by-step protocol that can be implemented with any of the popular statistical or econometric software packages. We introduce the protocol and its Stata implementation in the context of a real data example. Implementation of 2SRI for a very broad class of nonlinear models is then discussed. Additional examples are given. We analyze cigarette smoking as a determinant of infant birthweight using data from Mullahy (1997). It is hoped that the discussion will serve as a practical guide to implementation of the 2SRI protocol for applied researchers. © Health Research and Educational Trust.
Inverse odds ratio-weighted estimation for causal mediation analysis.
Tchetgen Tchetgen, Eric J
2013-11-20
An important scientific goal of studies in the health and social sciences is increasingly to determine to what extent the total effect of a point exposure is mediated by an intermediate variable on the causal pathway between the exposure and the outcome. A causal framework has recently been proposed for mediation analysis, which gives rise to new definitions, formal identification results and novel estimators of direct and indirect effects. In the present paper, the author describes a new inverse odds ratio-weighted approach to estimate so-called natural direct and indirect effects. The approach, which uses as a weight the inverse of an estimate of the odds ratio function relating the exposure and the mediator, is universal in that it can be used to decompose total effects in a number of regression models commonly used in practice. Specifically, the approach may be used for effect decomposition in generalized linear models with a nonlinear link function, and in a number of other commonly used models such as the Cox proportional hazards regression for a survival outcome. The approach is simple and can be implemented in standard software provided a weight can be specified for each observation. An additional advantage of the method is that it easily incorporates multiple mediators of a categorical, discrete or continuous nature. Copyright © 2013 John Wiley & Sons, Ltd.
Fractional Yields Inferred from Halo and Thick Disk Stars
NASA Astrophysics Data System (ADS)
Caimmi, R.
2013-12-01
Linear [Q/H]-[O/H] relations, Q = Na, Mg, Si, Ca, Ti, Cr, Fe, Ni, are inferred from a sample (N=67) of recently studied FGK-type dwarf stars in the solar neighbourhood including different populations (Nissen and Schuster 2010, Ramirez et al. 2012), namely LH (N=24, low-α halo), HH (N=25, high-α halo), KD (N=16, thick disk), and OL (N=2, globular cluster outliers). Regression line slope and intercept estimators and related variance estimators are determined. With regard to the straight line, [Q/H]=a_{Q}[O/H]+b_{Q}, sample stars are displayed along a "main sequence", [Q,O] = [a_{Q},b_{Q},Δ b_{Q}], leaving aside the two OL stars, which, in most cases (e.g. Na), lie outside. The unit slope, a_{Q}=1, implies Q is a primary element synthesised via SNII progenitors in the presence of a universal stellar initial mass function (defined as simple primary element). In this respect, Mg, Si, Ti, show hat a_{Q}=1 within ∓2hatσ_ {hat a_{Q}}; Cr, Fe, Ni, within ∓3hatσ_{hat a_{Q}}; Na, Ca, within ∓ rhatσ_{hat a_{Q}}, r>3. The empirical, differential element abundance distributions are inferred from LH, HH, KD, HA = HH + KD subsamples, where related regression lines represent their theoretical counterparts within the framework of simple MCBR (multistage closed box + reservoir) chemical evolution models. Hence, the fractional yields, hat{p}_{Q}/hat{p}_{O}, are determined and (as an example) a comparison is shown with their theoretical counterparts inferred from SNII progenitor nucleosynthesis under the assumption of a power-law stellar initial mass function. The generalized fractional yields, C_{Q}=Z_{Q}/Z_{O}^{a_{Q}}, are determined regardless of the chemical evolution model. The ratio of outflow to star formation rate is compared for different populations in the framework of simple MCBR models. The opposite situation of element abundance variation entirely due to cosmic scatter is also considered under reasonable assumptions. The related differential element abundance distribution fits to the data, as well as its counterpart inferred in the opposite limit of instantaneous mixing in the presence of chemical evolution, while the latter is preferred for HA subsample.
Estimates of Ground Temperature and Atmospheric Moisture from CERES Observations
NASA Technical Reports Server (NTRS)
Wu, Man Li C.; Schubert, Siegfried; Einaudi, Franco (Technical Monitor)
2000-01-01
A method is developed to retrieve surface ground temperature (T(sub g)) and atmospheric moisture using clear sky fluxes (CSF) from CERES-TRMM observations. In general, the clear sky outgoing longwave radiation (CLR) is sensitive to upper level moisture (q(sub l)) over wet regions and (T(sub g)) over dry regions The clear sky window flux from 800 to 1200/cm (RadWn) is sensitive to low level moisture (q(sub t)) and T(sub g). Combining these two measurements (CLR and RadWn), Tg and q(sub h) can be estimated over land, while q(sub h) and q(sub l) can be estimated over the oceans. The approach capitalizes on the availability of satellite estimates of CLR and RadWn and other auxiliary satellite data. The basic methodology employs off-line forward radiative transfer calculations to generate synthetic CSF data from two different global 4-dimensional data assimilation products. Simple linear regression is used to relate discrepancies in CSF to discrepancies in T(sub g), q(sub h) and q(sub l). The slopes of the regression lines define sensitivity parameters that can be exploited to help interpret mismatches between satellite observations and model-based estimates of CSF. For illustration, we analyze the discrepancies in the CSF between an early implementation of the Goddard Earth Observing System Data Assimilation System (GEOS-DAS) and a recent operational version of the European Center for Medium-Range Weather Prediction data assimilation system. In particular, our analysis of synthetic total and window region SCF differences (computed from two different assimilated data sets) shows that simple linear regression employing Delta(T(sub g)) and broad layer Delta(q(sub l) from .500 hPa to surface and Delta(q(sub h)) from 200 to .300 hPa provides a good approximation to the full radiative transfer calculations. typically explaining more than 90% of the 6-hourly variance in the flux differences. These simple regression relations can be inverted to "retrieve" the errors in the geophysical parameters. Uncertainties (normalized by standard deviation) in the monthly mean retrieved parameters range from 7% for Delta(T(sub g)) to about 20% for Delta(q(sub l)). Our initial application of the methodology employed an early CERES-TRMM data set (CLR and Radwn) to assess the quality of the GEOS2 data. The results showed that over the tropical and subtropical oceans GEOS2 is, in general, too wet in the upper troposphere (mean bias of 0.99 mm) and too dry in the lower troposphere (mean bias of -4.7 min). We note that these errors, as well as a cold bias in the T(sub g). have largely been corrected in the current version of GEOS-2 with the introduction of a land surface model, a moist turbulence scheme and the assimilation of SSM/I total precipitable water.
A statistical model to predict one-year risk of death in patients with cystic fibrosis.
Aaron, Shawn D; Stephenson, Anne L; Cameron, Donald W; Whitmore, George A
2015-11-01
We constructed a statistical model to assess the risk of death for cystic fibrosis (CF) patients between scheduled annual clinical visits. Our model includes a CF health index that shows the influence of risk factors on CF chronic health and on the severity and frequency of CF exacerbations. Our study used Canadian CF registry data for 3,794 CF patients born after 1970. Data up to 2010 were analyzed, yielding 44,390 annual visit records. Our stochastic process model postulates that CF health between annual clinical visits is a superposition of chronic disease progression and an exacerbation shock stream. Death occurs when an exacerbation carries CF health across a critical threshold. The data constitute censored survival data, and hence, threshold regression was used to connect CF death to study covariates. Maximum likelihood estimates were used to determine which clinical covariates were included within the regression functions for both CF chronic health and CF exacerbations. Lung function, Pseudomonas aeruginosa infection, CF-related diabetes, weight deficiency, pancreatic insufficiency, and the deltaF508 homozygous mutation were significantly associated with CF chronic health status. Lung function, age, gender, age at CF diagnosis, P aeruginosa infection, body mass index <18.5, number of previous hospitalizations for CF exacerbations in the preceding year, and decline in forced expiratory volume in 1 second in the preceding year were significantly associated with CF exacerbations. When combined in one summative model, the regression functions for CF chronic health and CF exacerbation risk provided a simple clinical scoring tool for assessing 1-year risk of death for an individual CF patient. Goodness-of-fit tests of the model showed very encouraging results. We confirmed predictive validity of the model by comparing actual and estimated deaths in repeated hold-out samples from the data set and showed excellent agreement between estimated and actual mortality. Our threshold regression model incorporates a composite CF chronic health status index and an exacerbation risk index to produce an accurate clinical scoring tool for prediction of 1-year survival of CF patients. Our tool can be used by clinicians to decide on optimal timing for lung transplant referral. Copyright © 2015 Elsevier Inc. All rights reserved.
Geospatial modeling of plant stable isotope ratios - the development of isoscapes
NASA Astrophysics Data System (ADS)
West, J. B.; Ehleringer, J. R.; Hurley, J. M.; Cerling, T. E.
2007-12-01
Large-scale spatial variation in stable isotope ratios can yield critical insights into the spatio-temporal dynamics of biogeochemical cycles, animal movements, and shifts in climate, as well as anthropogenic activities such as commerce, resource utilization, and forensic investigation. Interpreting these signals requires that we understand and model the variation. We report progress in our development of plant stable isotope ratio landscapes (isoscapes). Our approach utilizes a GIS, gridded datasets, a range of modeling approaches, and spatially distributed observations. We synthesize findings from four studies to illustrate the general utility of the approach, its ability to represent observed spatio-temporal variability in plant stable isotope ratios, and also outline some specific areas of uncertainty. We also address two basic, but critical questions central to our ability to model plant stable isotope ratios using this approach: 1. Do the continuous precipitation isotope ratio grids represent reasonable proxies for plant source water?, and 2. Do continuous climate grids (as is or modified) represent a reasonable proxy for the climate experienced by plants? Plant components modeled include leaf water, grape water (extracted from wine), bulk leaf material ( Cannabis sativa; marijuana), and seed oil ( Ricinus communis; castor bean). Our approaches to modeling the isotope ratios of these components varied from highly sophisticated process models to simple one-step fractionation models to regression approaches. The leaf water isosocapes were produced using steady-state models of enrichment and continuous grids of annual average precipitation isotope ratios and climate. These were compared to other modeling efforts, as well as a relatively sparse, but geographically distributed dataset from the literature. The latitudinal distributions and global averages compared favorably to other modeling efforts and the observational data compared well to model predictions. These results yield confidence in the precipitation isoscapes used to represent plant source water, the modified climate grids used to represent leaf climate, and the efficacy of this approach to modeling. Further work confirmed these observations. The seed oil isoscape was produced using a simple model of lipid fractionation driven with the precipitation grid, and compared well to widely distributed observations of castor bean oil, again suggesting that the precipitation grids were reasonable proxies for plant source water. The marijuana leaf δ2H observations distributed across the continental United States were regressed against the precipitation δ2H grids and yielded a strong relationship between them, again suggesting that plant source water was reasonably well represented by the precipitation grid. Finally, the wine water δ18O isoscape was developed from regressions that related precipitation isotope ratios and climate to observations from a single vintage. Favorable comparisons between year-specific wine water isoscapes and inter-annual variations in previous vintages yielded confidence in the climate grids. Clearly significant residual variability remains to be explained in all of these cases and uncertainties vary depending on the component modeled, but we conclude from this synthesis that isoscapes are capable of representing real spatial and temporal variability in plant stable isotope ratios.
Bankfull characteristics of Ohio streams and their relation to peak streamflows
Sherwood, James M.; Huitger, Carrie A.
2005-01-01
Regional curves, simple-regression equations, and multiple-regression equations were developed to estimate bankfull width, bankfull mean depth, bankfull cross-sectional area, and bankfull discharge of rural, unregulated streams in Ohio. The methods are based on geomorphic, basin, and flood-frequency data collected at 50 study sites on unregulated natural alluvial streams in Ohio, of which 40 sites are near streamflow-gaging stations. The regional curves and simple-regression equations relate the bankfull characteristics to drainage area. The multiple-regression equations relate the bankfull characteristics to drainage area, main-channel slope, main-channel elevation index, median bed-material particle size, bankfull cross-sectional area, and local-channel slope. Average standard errors of prediction for bankfull width equations range from 20.6 to 24.8 percent; for bankfull mean depth, 18.8 to 20.6 percent; for bankfull cross-sectional area, 25.4 to 30.6 percent; and for bankfull discharge, 27.0 to 78.7 percent. The simple-regression (drainage-area only) equations have the highest average standard errors of prediction. The multiple-regression equations in which the explanatory variables included drainage area, main-channel slope, main-channel elevation index, median bed-material particle size, bankfull cross-sectional area, and local-channel slope have the lowest average standard errors of prediction. Field surveys were done at each of the 50 study sites to collect the geomorphic data. Bankfull indicators were identified and evaluated, cross-section and longitudinal profiles were surveyed, and bed- and bank-material were sampled. Field data were analyzed to determine various geomorphic characteristics such as bankfull width, bankfull mean depth, bankfull cross-sectional area, bankfull discharge, streambed slope, and bed- and bank-material particle-size distribution. The various geomorphic characteristics were analyzed by means of a combination of graphical and statistical techniques. The logarithms of the annual peak discharges for the 40 gaged study sites were fit by a Pearson Type III frequency distribution to develop flood-peak discharges associated with recurrence intervals of 2, 5, 10, 25, 50, and 100 years. The peak-frequency data were related to geomorphic, basin, and climatic variables by multiple-regression analysis. Simple-regression equations were developed to estimate 2-, 5-, 10-, 25-, 50-, and 100-year flood-peak discharges of rural, unregulated streams in Ohio from bankfull channel cross-sectional area. The average standard errors of prediction are 31.6, 32.6, 35.9, 41.5, 46.2, and 51.2 percent, respectively. The study and methods developed are intended to improve understanding of the relations between geomorphic, basin, and flood characteristics of streams in Ohio and to aid in the design of hydraulic structures, such as culverts and bridges, where stability of the stream and structure is an important element of the design criteria. The study was done in cooperation with the Ohio Department of Transportation and the U.S. Department of Transportation, Federal Highway Administration.
Mathematical and computational model for the analysis of micro hybrid rocket motor
NASA Astrophysics Data System (ADS)
Stoia-Djeska, Marius; Mingireanu, Florin
2012-11-01
The hybrid rockets use a two-phase propellant system. In the present work we first develop a simplified model of the coupling of the hybrid combustion process with the complete unsteady flow, starting from the combustion port and ending with the nozzle. The physical and mathematical model are adapted to the simulations of micro hybrid rocket motors. The flow model is based on the one-dimensional Euler equations with source terms. The flow equations and the fuel regression rate law are solved in a coupled manner. The platform of the numerical simulations is an implicit fourth-order Runge-Kutta second order cell-centred finite volume method. The numerical results obtained with this model show a good agreement with published experimental and numerical results. The computational model developed in this work is simple, computationally efficient and offers the advantage of taking into account a large number of functional and constructive parameters that are used by the engineers.
Das, Rudra Narayan; Roy, Kunal
2014-06-01
Hazardous potential of ionic liquids is becoming an issue of high concern with increasing application of these compounds in various industrial processes. Predictive toxicological modeling on ionic liquids provides a rational assessment strategy and aids in developing suitable guidance for designing novel analogues. The present study attempts to explore the chemical features of ionic liquids responsible for their ecotoxicity towards the green algae Scenedesmus vacuolatus by developing mathematical models using extended topochemical atom (ETA) indices along with other categories of chemical descriptors. The entire study has been conducted with reference to the OECD guidelines for QSAR model development using predictive classification and regression modeling strategies. The best models from both the analyses showed that ecotoxicity of ionic liquids can be decreased by reducing chain length of cationic substituents and increasing hydrogen bond donor feature in cations, and replacing bulky unsaturated anions with simple saturated moiety having less lipophilic heteroatoms. Copyright © 2013 Elsevier Ltd. All rights reserved.
Speiser, Jaime Lynn; Lee, William M; Karvellas, Constantine J
2015-01-01
Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models. CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission) and post-admission (days 3-7) for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC), sensitivity (SN), specificity (SP), and area under receiver-operating curve (AUROC) were compared between 3 models: KCC (INR, creatinine, coma grade, pH), CART analysis using only KCC variables (KCC-CART) and a CART model using new variables (NEW-CART). Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease), lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade) offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73. CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed. • Prognostication in acetaminophen-induced acute liver failure (APAP-ALF) is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC) beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART) methodology allows the development of predictive models using binary splits and offers an intuitive method for predicting outcome, using processes familiar to clinicians • Data from the ALFSG registry suggested that CART prognosis models for the APAP population offer improved sensitivity and model performance over traditional regression-based KCC, while maintaining similar accuracy and negligibly worse specificity • KCC-CART models offered modest improvement over traditional KCC, with NEW-CART models performing better than KCC-CART particularly at late time points.
Factors associated with active commuting to work among women.
Bopp, Melissa; Child, Stephanie; Campbell, Matthew
2014-01-01
Active commuting (AC), the act of walking or biking to work, has notable health benefits though rates of AC remain low among women. This study used a social-ecological framework to examine the factors associated with AC among women. A convenience sample of employed, working women (n = 709) completed an online survey about their mode of travel to work. Individual, interpersonal, institutional, community, and environmental influences were assessed. Basic descriptive statistics and frequencies described the sample. Simple logistic regression models examined associations with the independent variables with AC participation and multiple logistic regression analysis determined the relative influence of social ecological factors on AC participation. The sample was primarily middle-aged (44.09±11.38 years) and non-Hispanic White (92%). Univariate analyses revealed several individual, interpersonal, institutional, community and environmental factors significantly associated with AC. The multivariable logistic regression analysis results indicated that significant factors associated with AC included number of children, income, perceived behavioral control, coworker AC, coworker AC normative beliefs, employer and community supports for AC, and traffic. The results of this study contribute to the limited body of knowledge on AC participation for women and may help to inform gender-tailored interventions to enhance AC behavior and improve health.
Shi, Yuanyuan; Zhan, Hao; Zhong, Liuyi; Yan, Fangrong; Feng, Feng; Liu, Wenyuan; Xie, Ning
2016-07-01
A method of total ion chromatogram combined with chemometrics and mass defect filter was established for the prediction of active ingredients in Picrasma quassioides samples. The total ion chromatogram data of 28 batches were pretreated with wavelet transformation and correlation optimized warping to correct baseline drifts and retention time shifts. Then partial least squares regression was applied to construct a regression model to bridge the total ion chromatogram fingerprints and the antitumor activity of P. quassioides. Finally, the regression coefficients were used to predict the active peaks in total ion chromatogram fingerprints. In this strategy, mass defect filter was employed to classify and characterize the active peaks from a chemical point of view. A total of 17 constituents were predicted as the potential active compounds, 16 of which were identified as alkaloids by this developed approach. The results showed that the established method was not only simple and easy to operate, but also suitable to predict ultraviolet undetectable compounds and provide chemical information for the prediction of active compounds in herbs. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.
Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A
2015-07-01
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Kalton, G.
1983-01-01
A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
Sandborgh, Maria; Johansson, Ann-Christin; Söderlund, Anne
2016-01-01
In the fear-avoidance (FA) model social cognitive constructs could add to explaining the disabling process in whiplash associated disorder (WAD). The aim was to exemplify the possible input from Social Cognitive Theory on the FA model. Specifically the role of functional self-efficacy and perceived responses from a spouse/intimate partner was studied. A cross-sectional and correlational design was used. Data from 64 patients with acute WAD were used. Measures were pain intensity measured with a numerical rating scale, the Pain Disability Index, support, punishing responses, solicitous responses, and distracting responses subscales from the Multidimensional Pain Inventory, the Catastrophizing subscale from the Coping Strategies Questionnaire, the Tampa Scale of Kinesiophobia, and the Self-Efficacy Scale. Bivariate correlational, simple linear regression, and multiple regression analyses were used. In the statistical prediction models high pain intensity indicated high punishing responses, which indicated high catastrophizing. High catastrophizing indicated high fear of movement, which indicated low self-efficacy. Low self-efficacy indicated high disability, which indicated high pain intensity. All independent variables together explained 66.4% of the variance in pain disability, p < 0.001. Results suggest a possible link between one aspect of the social environment, perceived punishing responses from a spouse/intimate partner, pain intensity, and catastrophizing. Further, results support a mediating role of self-efficacy between fear of movement and disability in WAD.
Brezonik, Patrick L; Stadelmann, Teresa H
2002-04-01
Urban nonpoint source pollution is a significant contributor to water quality degradation. Watershed planners need to be able to estimate nonpoint source loads to lakes and streams if they are to plan effective management strategies. To meet this need for the twin cities metropolitan area, a large database of urban and suburban runoff data was compiled. Stormwater runoff loads and concentrations of 10 common constituents (six N and P forms, TSS, VSS, COD, Pb) were characterized, and effects of season and land use were analyzed. Relationships between runoff variables and storm and watershed characteristics were examined. The best regression equation to predict runoff volume for rain events was based on rainfall amount, drainage area, and percent impervious area (R2 = 0.78). Median event-mean concentrations (EMCs) tended to be higher in snowmelt runoff than in rainfall runoff, and significant seasonal differences were found in yields (kg/ha) and EMCs for most constituents. Simple correlations between explanatory variables and stormwater loads and EMCs were weak. Rainfall amount and intensity and drainage area were the most important variables in multiple linear regression models to predict event loads, but uncertainty was high in models developed with the pooled data set. The most accurate models for EMCs generally were found when sites were grouped according to common land use and size.
Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.
2013-01-01
The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.
Shuryak, Igor; Loucas, Bradford D; Cornforth, Michael N
2017-01-01
The concept of curvature in dose-response relationships figures prominently in radiation biology, encompassing a wide range of interests including radiation protection, radiotherapy and fundamental models of radiation action. In this context, the ability to detect even small amounts of curvature becomes important. Standard (ST) statistical approaches used for this purpose typically involve least-squares regression, followed by a test on sums of squares. Because we have found that these methods are not particularly robust, we investigated an alternative information theoretic (IT) approach, which involves Poisson regression followed by information-theoretic model selection. Our first objective was to compare the performances of the ST and IT methods by using them to analyze mFISH data on gamma-ray-induced simple interchanges in human lymphocytes, and on Monte Carlo simulated data. Real and simulated data sets that contained small-to-moderate curvature were deliberately selected for this exercise. The IT method tended to detect curvature with higher confidence than the ST method. The finding of curvature in the dose response for true simple interchanges is discussed in the context of fundamental models of radiation action. Our second objective was to optimize the design of experiments aimed specifically at detecting curvature. We used Monte Carlo simulation to investigate the following parameters. Constrained by available resources (i.e., the total number of cells to be scored) these include: the optimal number of dose points to use; the best way to apportion the total number of cells among these dose points; and the spacing of dose intervals. Counterintuitively, our simulation results suggest that 4-5 radiation doses were typically optimal, whereas adding more dose points may actually prove detrimental. Superior results were also obtained by implementing unequal dose spacing and unequal distributions in the number of cells scored at each dose.
Chan, Ta-Chien; Teng, Yung-Chu; Hwang, Jing-Shiang
2015-02-21
Emerging novel influenza outbreaks have increasingly been a threat to the public and a major concern of public health departments. Real-time data in seamless surveillance systems such as health insurance claims data for influenza-like illnesses (ILI) are ready for analysis, making it highly desirable to develop practical techniques to analyze such readymade data for outbreak detection so that the public can receive timely influenza epidemic warnings. This study proposes a simple and effective approach to analyze area-based health insurance claims data including outpatient and emergency department (ED) visits for early detection of any aberrations of ILI. The health insurance claims data during 2004-2009 from a national health insurance research database were used for developing early detection methods. The proposed approach fitted the daily new ILI visits and monitored the Pearson residuals directly for aberration detection. First, negative binomial regression was used for both outpatient and ED visits to adjust for potentially influential factors such as holidays, weekends, seasons, temporal dependence and temperature. Second, if the Pearson residuals exceeded 1.96, aberration signals were issued. The empirical validation of the model was done in 2008 and 2009. In addition, we designed a simulation study to compare the time of outbreak detection, non-detection probability and false alarm rate between the proposed method and modified CUSUM. The model successfully detected the aberrations of 2009 pandemic (H1N1) influenza virus in northern, central and southern Taiwan. The proposed approach was more sensitive in identifying aberrations in ED visits than those in outpatient visits. Simulation studies demonstrated that the proposed approach could detect the aberrations earlier, and with lower non-detection probability and mean false alarm rate in detecting aberrations compared to modified CUSUM methods. The proposed simple approach was able to filter out temporal trends, adjust for temperature, and issue warning signals for the first wave of the influenza epidemic in a timely and accurate manner.
Adler, Peter B; Kleinhesselink, Andrew; Giles, Hooker; Taylor, J Bret; Teller, Brittany; Ellner, Stephen P
2018-04-28
Stable coexistence requires intraspecific limitations to be stronger than interspecific limitations. The greater the difference between intra- and interspecific limitations, the more stable the coexistence, and the weaker the competitive release any species should experience following removal of competitors. We conducted a removal experiment to test whether a previously estimated model, showing surprisingly weak interspecific competition for four dominant species in a sagebrush steppe, accurately predicts competitive release. Our treatments were 1) removal of all perennial grasses and 2) removal of the dominant shrub, Artemisia tripartita. We regressed survival, growth and recruitment on the locations, sizes, and species identities of neighboring plants, along with an indicator variable for removal treatment. If our "baseline" regression model, which accounts for local plant-plant interactions, accurately explains the observed responses to removals, then the removal coefficient should be non-significant. For survival, the removal coefficients were never significantly different from zero, and only A. tripartita showed a (negative) response to removals at the recruitment stage. For growth, the removal treatment effect was significant and positive for two species, Poa secunda and Pseudoroegneria spicata, indicating that the baseline model underestimated interspecific competition. For all three grass species, population models based on the vital rate regressions that included removal effects projected 1.4 to 3-fold increases in equilibrium population size relative to the baseline model (no removal effects). However, we found no evidence of higher response to removal in quadrats with higher pretreatment cover of A. tripartita, or by plants experiencing higher pre-treatment crowding by A. tripartita, raising questions about the mechanisms driving the positive response to removal. While our results show the value of combining observations with a simple removal experiment, more tightly controlled experiments focused on underlying mechanisms may be required to conclusively validate or reject predictions from phenomenological models. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Parental education predicts change in intelligence quotient after childhood epilepsy surgery.
Meekes, Joost; van Schooneveld, Monique M J; Braams, Olga B; Jennekens-Schinkel, Aag; van Rijen, Peter C; Hendriks, Marc P H; Braun, Kees P J; van Nieuwenhuizen, Onno
2015-04-01
To know whether change in the intelligence quotient (IQ) of children who undergo epilepsy surgery is associated with the educational level of their parents. Retrospective analysis of data obtained from a cohort of children who underwent epilepsy surgery between January 1996 and September 2010. We performed simple and multiple regression analyses to identify predictors associated with IQ change after surgery. In addition to parental education, six variables previously demonstrated to be associated with IQ change after surgery were included as predictors: age at surgery, duration of epilepsy, etiology, presurgical IQ, reduction of antiepileptic drugs, and seizure freedom. We used delta IQ (IQ 2 years after surgery minus IQ shortly before surgery) as the primary outcome variable, but also performed analyses with pre- and postsurgical IQ as outcome variables to support our findings. To validate the results we performed simple regression analysis with parental education as the predictor in specific subgroups. The sample for regression analysis included 118 children (60 male; median age at surgery 9.73 years). Parental education was significantly associated with delta IQ in simple regression analysis (p = 0.004), and also contributed significantly to postsurgical IQ in multiple regression analysis (p = 0.008). Additional analyses demonstrated that parental education made a unique contribution to prediction of delta IQ, that is, it could not be replaced by the illness-related variables. Subgroup analyses confirmed the association of parental education with IQ change after surgery for most groups. Children whose parents had higher education demonstrate on average a greater increase in IQ after surgery and a higher postsurgical--but not presurgical--IQ than children whose parents completed at most lower secondary education. Parental education--and perhaps other environmental variables--should be considered in the prognosis of cognitive function after childhood epilepsy surgery. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.
Weighing Evidence "Steampunk" Style via the Meta-Analyser.
Bowden, Jack; Jackson, Chris
2016-10-01
The funnel plot is a graphical visualization of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modeling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the center of mass of a physical system. We used this analogy to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulas and with a little help from Sherlock Holmes! Following on from the science fair, we have developed an interactive web-application (named the Meta-Analyser) to bring these ideas to a wider audience. We envisage that our application will be a useful tool for researchers when interpreting their data. First, to facilitate a simple understanding of fixed and random effects modeling approaches; second, to assess the importance of outliers; and third, to show the impact of adjusting for small study bias. This final aim is realized by introducing a novel graphical interpretation of the well-known method of Egger regression.
NASA Astrophysics Data System (ADS)
Aligholi, Saeed; Lashkaripour, Gholam Reza; Ghafoori, Mohammad
2017-01-01
This paper sheds further light on the fundamental relationships between simple methods, rock strength, and brittleness of igneous rocks. In particular, the relationship between mechanical (point load strength index I s(50) and brittleness value S 20), basic physical (dry density and porosity), and dynamic properties (P-wave velocity and Schmidt rebound values) for a wide range of Iranian igneous rocks is investigated. First, 30 statistical models (including simple and multiple linear regression analyses) were built to identify the relationships between mechanical properties and simple methods. The results imply that rocks with different Schmidt hardness (SH) rebound values have different physicomechanical properties or relations. Second, using these results, it was proved that dry density, P-wave velocity, and SH rebound value provide a fine complement to mechanical properties classification of rock materials. Further, a detailed investigation was conducted on the relationships between mechanical and simple tests, which are established with limited ranges of P-wave velocity and dry density. The results show that strength values decrease with the SH rebound value. In addition, there is a systematic trend between dry density, P-wave velocity, rebound hardness, and brittleness value of the studied rocks, and rocks with medium hardness have a higher brittleness value. Finally, a strength classification chart and a brittleness classification table are presented, providing reliable and low-cost methods for the classification of igneous rocks.
Van de Voorde, Tim; Vlaeminck, Jeroen; Canters, Frank
2008-01-01
Urban growth and its related environmental problems call for sustainable urban management policies to safeguard the quality of urban environments. Vegetation plays an important part in this as it provides ecological, social, health and economic benefits to a city's inhabitants. Remotely sensed data are of great value to monitor urban green and despite the clear advantages of contemporary high resolution images, the benefits of medium resolution data should not be discarded. The objective of this research was to estimate fractional vegetation cover from a Landsat ETM+ image with sub-pixel classification, and to compare accuracies obtained with multiple stepwise regression analysis, linear spectral unmixing and multi-layer perceptrons (MLP) at the level of meaningful urban spatial entities. Despite the small, but nevertheless statistically significant differences at pixel level between the alternative approaches, the spatial pattern of vegetation cover and estimation errors is clearly distinctive at neighbourhood level. At this spatially aggregated level, a simple regression model appears to attain sufficient accuracy. For mapping at a spatially more detailed level, the MLP seems to be the most appropriate choice. Brightness normalisation only appeared to affect the linear models, especially the linear spectral unmixing. PMID:27879914
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
MEASUREMENT OF WIND SPEED FROM COOLING LAKE THERMAL IMAGERY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, A; Robert Kurzeja, R; Eliel Villa-Aleman, E
2009-01-20
The Savannah River National Laboratory (SRNL) collected thermal imagery and ground truth data at two commercial power plant cooling lakes to investigate the applicability of laboratory empirical correlations between surface heat flux and wind speed, and statistics derived from thermal imagery. SRNL demonstrated in a previous paper [1] that a linear relationship exists between the standard deviation of image temperature and surface heat flux. In this paper, SRNL will show that the skewness of the temperature distribution derived from cooling lake thermal images correlates with instantaneous wind speed measured at the same location. SRNL collected thermal imagery, surface meteorology andmore » water temperatures from helicopters and boats at the Comanche Peak and H. B. Robinson nuclear power plant cooling lakes. SRNL found that decreasing skewness correlated with increasing wind speed, as was the case for the laboratory experiments. Simple linear and orthogonal regression models both explained about 50% of the variance in the skewness - wind speed plots. A nonlinear (logistic) regression model produced a better fit to the data, apparently because the thermal convection and resulting skewness are related to wind speed in a highly nonlinear way in nearly calm and in windy conditions.« less
Covariate Measurement Error Correction Methods in Mediation Analysis with Failure Time Data
Zhao, Shanshan
2014-01-01
Summary Mediation analysis is important for understanding the mechanisms whereby one variable causes changes in another. Measurement error could obscure the ability of the potential mediator to explain such changes. This paper focuses on developing correction methods for measurement error in the mediator with failure time outcomes. We consider a broad definition of measurement error, including technical error and error associated with temporal variation. The underlying model with the ‘true’ mediator is assumed to be of the Cox proportional hazards model form. The induced hazard ratio for the observed mediator no longer has a simple form independent of the baseline hazard function, due to the conditioning event. We propose a mean-variance regression calibration approach and a follow-up time regression calibration approach, to approximate the partial likelihood for the induced hazard function. Both methods demonstrate value in assessing mediation effects in simulation studies. These methods are generalized to multiple biomarkers and to both case-cohort and nested case-control sampling design. We apply these correction methods to the Women's Health Initiative hormone therapy trials to understand the mediation effect of several serum sex hormone measures on the relationship between postmenopausal hormone therapy and breast cancer risk. PMID:25139469
Covariate measurement error correction methods in mediation analysis with failure time data.
Zhao, Shanshan; Prentice, Ross L
2014-12-01
Mediation analysis is important for understanding the mechanisms whereby one variable causes changes in another. Measurement error could obscure the ability of the potential mediator to explain such changes. This article focuses on developing correction methods for measurement error in the mediator with failure time outcomes. We consider a broad definition of measurement error, including technical error, and error associated with temporal variation. The underlying model with the "true" mediator is assumed to be of the Cox proportional hazards model form. The induced hazard ratio for the observed mediator no longer has a simple form independent of the baseline hazard function, due to the conditioning event. We propose a mean-variance regression calibration approach and a follow-up time regression calibration approach, to approximate the partial likelihood for the induced hazard function. Both methods demonstrate value in assessing mediation effects in simulation studies. These methods are generalized to multiple biomarkers and to both case-cohort and nested case-control sampling designs. We apply these correction methods to the Women's Health Initiative hormone therapy trials to understand the mediation effect of several serum sex hormone measures on the relationship between postmenopausal hormone therapy and breast cancer risk. © 2014, The International Biometric Society.
Computational approaches for predicting biomedical research collaborations.
Zhang, Qing; Yu, Hong
2014-01-01
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
Drivers and potential predictability of summer time North Atlantic polar front jet variability
NASA Astrophysics Data System (ADS)
Hall, Richard J.; Jones, Julie M.; Hanna, Edward; Scaife, Adam A.; Erdélyi, Róbert
2017-06-01
The variability of the North Atlantic polar front jet stream is crucial in determining summer weather around the North Atlantic basin. Recent extreme summers in western Europe and North America have highlighted the need for greater understanding of this variability, in order to aid seasonal forecasting and mitigate societal, environmental and economic impacts. Here we find that simple linear regression and composite models based on a few predictable factors are able to explain up to 35 % of summertime jet stream speed and latitude variability from 1955 onwards. Sea surface temperature forcings impact predominantly on jet speed, whereas solar and cryospheric forcings appear to influence jet latitude. The cryospheric associations come from the previous autumn, suggesting the survival of an ice-induced signal through the winter season, whereas solar influences lead jet variability by a few years. Regression models covering the earlier part of the twentieth century are much less effective, presumably due to decreased availability of data, and increased uncertainty in observational reanalyses. Wavelet coherence analysis identifies that associations fluctuate over the study period but it is not clear whether this is just internal variability or genuine non-stationarity. Finally we identify areas for future research.
Optical identification of subjects at high risk for developing breast cancer
NASA Astrophysics Data System (ADS)
Taroni, Paola; Quarto, Giovanna; Pifferi, Antonio; Ieva, Francesca; Paganoni, Anna Maria; Abbate, Francesca; Balestreri, Nicola; Menna, Simona; Cassano, Enrico; Cubeddu, Rinaldo
2013-06-01
A time-domain multiwavelength (635 to 1060 nm) optical mammography was performed on 147 subjects with recent x-ray mammograms available, and average breast tissue composition (water, lipid, collagen, oxy- and deoxyhemoglobin) and scattering parameters (amplitude a and slope b) were estimated. Correlation was observed between optically derived parameters and mammographic density [Breast Imaging and Reporting Data System (BI-RADS) categories], which is a strong risk factor for breast cancer. A regression logistic model was obtained to best identify high-risk (BI-RADS 4) subjects, based on collagen content and scattering parameters. The model presents a total misclassification error of 12.3%, sensitivity of 69%, specificity of 94%, and simple kappa of 0.84, which compares favorably even with intraradiologist assignments of BI-RADS categories.
Divya, O; Mishra, Ashok K
2007-05-29
Quantitative determination of kerosene fraction present in diesel has been carried out based on excitation emission matrix fluorescence (EEMF) along with parallel factor analysis (PARAFAC) and N-way partial least squares regression (N-PLS). EEMF is a simple, sensitive and nondestructive method suitable for the analysis of multifluorophoric mixtures. Calibration models consisting of varying compositions of diesel and kerosene were constructed and their validation was carried out using leave-one-out cross validation method. The accuracy of the model was evaluated through the root mean square error of prediction (RMSEP) for the PARAFAC, N-PLS and unfold PLS methods. N-PLS was found to be a better method compared to PARAFAC and unfold PLS method because of its low RMSEP values.
NASA Astrophysics Data System (ADS)
Alp, E.; Yücel, Ö.; Özcan, Z.
2014-12-01
Turkey has been making many legal arrangements for sustainable water management during the harmonization process with the European Union. In order to make cost effective and efficient decisions, monitoring network in Turkey has been expanding. However, due to time and budget constraints, desired number of monitoring campaigns can not be carried. Hence, in this study, independent parameters that can be measured easily and quickly are used to estimate water quality parameters in Lake Mogan and Eymir using linear regression. Nonpoint sources are one of the major pollutant components in Eymir and Mogan lakes. In this paper, a correlation between easily measurable parameters, DO, temperature, electrical conductivity, pH, precipitation and dependent variables, TN, TP, COD, Chl-a, TSS, Total Coliform is investigated. Simple regression analysis is performed for each season in Eymir and Mogan lakes by using SPSS Statistical program using the water quality data collected between 2006-2012. Regression analysis demonstrated significant linear relationship between measured and simulated concentrations for TN (R2=0.86), TP (R2=0.85), TSS (R2=0.91), Chl-a (R2=0.94), COD (R2=0.99), T. Coliform (R2=0.97) which are the best results in each season for Eymir and Mogan Lakes. The overall results of this study shows that by using easily measurable parameters even in ungauged situation the water quality of lakes can be predicted. Moreover, the outputs obtained from the regression equations can be used as an input for water quality models such as phosphorus budget model which is used to calculate the required reduction in the external phosphorus load to Lake Mogan to meet the water quality standards.
NASA Astrophysics Data System (ADS)
Muthukrishnan, S.; Harbor, J.
2001-12-01
Hydrological studies are significant part of every engineering, developmental project and geological studies done to assess and understand the interactions between the hydrology and the environment. Such studies are generally conducted before the beginning of the project as well as after the project is completed, such that a comprehensive analysis can be done on the impact of such projects on the local and regional hydrology of the area. A good understanding of the chain of relationships that form the hydro-eco-biological and environmental cycle can be of immense help in maintaining the natural balance as we work towards exploration and exploitation of the natural resources as well as urbanization of undeveloped land. Rainfall-Runoff modeling techniques have been of great use here for decades since they provide fast and efficient means of analyzing vast amount of data that is gathered. Though process based, detailed models are better than the simple models, the later ones are used more often due to their simplicity, ease of use, and easy availability of data needed to run them. The Curve Number (CN) method developed by the United States Department of Agriculture (USDA) is one of the most widely used hydrologic modeling tools in the US, and has earned worldwide acceptance as a practical method for evaluating the effects of land use changes on the hydrology of an area. The Long-Term Hydrological Impact Assessment (L-THIA) model is a basic, CN-based, user-oriented model that has gained popularity amongst watershed planners because of its reliance on readily available data, and because the model is easy to use (http://www.ecn.purdue.edu/runoff) and produces results geared to the general information needs of planners. The L-THIA model was initially developed to study the relative long-term hydrologic impacts of different land use (past/current/future) scenarios, and it has been successful in meeting this goal. However, one of the weaknesses of L-THIA, as well as other models that focus strictly on surface runoff, is that many users are interested in predictions of runoff that match observations of flow in streams and rivers. To make L-THIA more useful for the planners and engineers alike, a simple, long-term calibration method based on linear regression of L-THIA predicted and observed surface runoff has been developed and tested here. The results from Little Eagle Creek (LEC) in Indiana show that such calibrations are successful and valuable. This method can be used to calibrate other simple rainfall-runoff models too.
Overton, Edgar Turner; Kauwe, John S.K.; Paul, Rob; Tashima, Karen; Tate, David F.; Patel, Pragna; Carpenter, Chuck; Patty, David; Brooks, John T.; Clifford, David B
2013-01-01
HIV-associated neurocognitive disorders (HAND) remain prevalent but challenging to diagnose particularly among non-demented individuals. To determine whether a brief computerized battery correlates with formal neurocognitive testing, we identified 46 HIV-infected persons who had undergone both formal neurocognitive testing and a brief computerized battery. Simple detection tests correlated best with formal neuropsychological testing. By multivariable regression model, 53% of the variance in the composite Global Deficit Score was accounted for by elements from the brief computerized tool (p<0.01). These data confirm previous correlation data with the computerized battery, yet illustrate remaining challenges for neurocognitive screening. PMID:21877204