Partial covariate adjusted regression
Şentürk, Damla; Nguyen, Danh V.
2008-01-01
Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296
Coercively Adjusted Auto Regression Model for Forecasting in Epilepsy EEG
Kim, Sun-Hee; Faloutsos, Christos; Yang, Hyung-Jeong
2013-01-01
Recently, data with complex characteristics such as epilepsy electroencephalography (EEG) time series has emerged. Epilepsy EEG data has special characteristics including nonlinearity, nonnormality, and nonperiodicity. Therefore, it is important to find a suitable forecasting method that covers these special characteristics. In this paper, we propose a coercively adjusted autoregression (CA-AR) method that forecasts future values from a multivariable epilepsy EEG time series. We use the technique of random coefficients, which forcefully adjusts the coefficients with −1 and 1. The fractal dimension is used to determine the order of the CA-AR model. We applied the CA-AR method reflecting special characteristics of data to forecast the future value of epilepsy EEG data. Experimental results show that when compared to previous methods, the proposed method can forecast faster and accurately. PMID:23710252
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, Anne B.; Lizarraga, Joy S.
1996-01-01
Statistical operations termed model-adjustment procedures can be used to incorporate local data into existing regression modes to improve the predication of urban-runoff quality. Each procedure is a form of regression analysis in which the local data base is used as a calibration data set; the resulting adjusted regression models can then be used to predict storm-runoff quality at unmonitored sites. Statistical tests of the calibration data set guide selection among proposed procedures.
Comparison of the Properties of Regression and Categorical Risk-Adjustment Models
Averill, Richard F.; Muldoon, John H.; Hughes, John S.
2016-01-01
Clinical risk-adjustment, the ability to standardize the comparison of individuals with different health needs, is based upon 2 main alternative approaches: regression models and clinical categorical models. In this article, we examine the impact of the differences in the way these models are constructed on end user applications. PMID:26945302
ERIC Educational Resources Information Center
Olejnik, Stephen; Mills, Jamie; Keselman, Harvey
2000-01-01
Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for
Kleinman, Lawrence C; Norton, Edward C
2009-01-01
Objective To develop and validate a general method (called regression risk analysis) to estimate adjusted risk measures from logistic and other nonlinear multiple regression models. We show how to estimate standard errors for these estimates. These measures could supplant various approximations (e.g., adjusted odds ratio [AOR]) that may diverge, especially when outcomes are common. Study Design Regression risk analysis estimates were compared with internal standards as well as with Mantel–Haenszel estimates, Poisson and log-binomial regressions, and a widely used (but flawed) equation to calculate adjusted risk ratios (ARR) from AOR. Data Collection Data sets produced using Monte Carlo simulations. Principal Findings Regression risk analysis accurately estimates ARR and differences directly from multiple regression models, even when confounders are continuous, distributions are skewed, outcomes are common, and effect size is large. It is statistically sound and intuitive, and has properties favoring it over other methods in many cases. Conclusions Regression risk analysis should be the new standard for presenting findings from multiple regression analysis of dichotomous outcomes for cross-sectional, cohort, and population-based case–control studies, particularly when outcomes are common or effect size is large. PMID:18793213
Barks, C.S.
1995-01-01
Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of
Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar
2016-08-15
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26892025
Hoos, Anne B.; Patel, Anant R.
1996-01-01
Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul
2011-09-01
We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data. PMID:21556848
Weather adjustment using seemingly unrelated regression
Noll, T.A.
1995-05-01
Seemingly unrelated regression (SUR) is a system estimation technique that accounts for time-contemporaneous correlation between individual equations within a system of equations. SUR is suited to weather adjustment estimations when the estimation is: (1) composed of a system of equations and (2) the system of equations represents either different weather stations, different sales sectors or a combination of different weather stations and different sales sectors. SUR utilizes the cross-equation error values to develop more accurate estimates of the system coefficients than are obtained using ordinary least-squares (OLS) estimation. SUR estimates can be generated using a variety of statistical software packages including MicroTSP and SAS.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Kjelstrom, L.C.
1995-01-01
Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Estimation of adjusted rate differences using additive negative binomial regression.
Donoghoe, Mark W; Marschner, Ian C
2016-08-15
Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156
Agogo, George O; van der Voet, Hilko; Van't Veer, Pieter; van Eeuwijk, Fred A; Boshuizen, Hendriek C
2016-07-01
Dietary questionnaires are prone to measurement error, which bias the perceived association between dietary intake and risk of disease. Short-term measurements are required to adjust for the bias in the association. For foods that are not consumed daily, the short-term measurements are often characterized by excess zeroes. Via a simulation study, the performance of a two-part calibration model that was developed for a single-replicate study design was assessed by mimicking leafy vegetable intake reports from the multicenter European Prospective Investigation into Cancer and Nutrition (EPIC) study. In part I of the fitted two-part calibration model, a logistic distribution was assumed; in part II, a gamma distribution was assumed. The model was assessed with respect to the magnitude of the correlation between the consumption probability and the consumed amount (hereafter, cross-part correlation), the number and form of covariates in the calibration model, the percentage of zero response values, and the magnitude of the measurement error in the dietary intake. From the simulation study results, transforming the dietary variable in the regression calibration to an appropriate scale was found to be the most important factor for the model performance. Reducing the number of covariates in the model could be beneficial, but was not critical in large-sample studies. The performance was remarkably robust when fitting a one-part rather than a two-part model. The model performance was minimally affected by the cross-part correlation. PMID:27003183
Technology Transfer Automated Retrieval System (TEKTRAN)
A method of accounting for differences in variation in components of test-day milk production records was developed. This method could improve the accuracy of genetic evaluations. A random regression model is used to analyze the data, then a transformation is applied to the random regression coeffic...
Kautter, John; Pope, Gregory C.
2004-01-01
The authors document the development of the CMS frailty adjustment model, a Medicare payment approach that adjusts payments to a Medicare managed care organization (MCO) according to the functional impairment of its community-residing enrollees. Beginning in 2004, this approach is being applied to certain organizations, such as Program of All-Inclusive Care for the Elderly (PACE), that specialize in providing care to the community-residing frail elderly. In the future, frailty adjustment could be extended to more Medicare managed care organizations. PMID:25372243
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Regression Models of Atlas Appearance
Rohlfing, Torsten; Sullivan, Edith V.; Pfefferbaum, Adolf
2010-01-01
Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260
Regression models of atlas appearance.
Rohlfing, Torsten; Sullivan, Edith V; Pfefferbaum, Adolf
2009-01-01
Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Regression modelling of Dst index
NASA Astrophysics Data System (ADS)
Parnowski, Aleksei
We developed a new approach to the problem of real-time space weather indices forecasting using readily available data from ACE and a number of ground stations. It is based on the regression modelling method [1-3], which combines the benefits of empirical and statistical approaches. Mathematically it is based upon the partial regression analysis and Monte Carlo simulations to deduce the empirical relationships in the system. The typical elapsed time per forecast is a few seconds on an average PC. This technique can be easily extended to other indices like AE and Kp. The proposed system can also be useful for investigating physical phenomena related to interactions between the solar wind and the magnetosphere -it already helped uncovering two new geoeffective parameters. 1. Parnowski A.S. Regression modeling method of space weather prediction // Astrophysics Space Science. — 2009. — V. 323, 2. — P. 169-180. doi:10.1007/s10509-009-0060-4 [arXiv:0906.3271] 2. Parnovskiy A.S. Regression Modeling and its Application to the Problem of Prediction of Space Weather // Journal of Automation and Information Sciences. — 2009. — V. 41, 5. — P. 61-69. doi:10.1615/JAutomatInfScien.v41.i5.70 3. Parnowski A.S. Statistically predicting Dst without satellite data // Earth, Planets and Space. — 2009. — V. 61, 5. — P. 621-624.
Heteroscedastic transformation cure regression models.
Chen, Chyong-Mei; Chen, Chen-Hsin
2016-06-30
Cure models have been applied to analyze clinical trials with cures and age-at-onset studies with nonsusceptibility. Lu and Ying (On semiparametric transformation cure model. Biometrika 2004; 91:331?-343. DOI: 10.1093/biomet/91.2.331) developed a general class of semiparametric transformation cure models, which assumes that the failure times of uncured subjects, after an unknown monotone transformation, follow a regression model with homoscedastic residuals. However, it cannot deal with frequently encountered heteroscedasticity, which may result from dispersed ranges of failure time span among uncured subjects' strata. To tackle the phenomenon, this article presents semiparametric heteroscedastic transformation cure models. The cure status and the failure time of an uncured subject are fitted by a logistic regression model and a heteroscedastic transformation model, respectively. Unlike the approach of Lu and Ying, we derive score equations from the full likelihood for estimating the regression parameters in the proposed model. The similar martingale difference function to their proposal is used to estimate the infinite-dimensional transformation function. Our proposed estimating approach is intuitively applicable and can be conveniently extended to other complicated models when the maximization of the likelihood may be too tedious to be implemented. We conduct simulation studies to validate large-sample properties of the proposed estimators and to compare with the approach of Lu and Ying via the relative efficiency. The estimating method and the two relevant goodness-of-fit graphical procedures are illustrated by using breast cancer data and melanoma data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26887342
Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects
ERIC Educational Resources Information Center
Rocconi, Louis M.; Ethington, Corinna A.
2009-01-01
Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Building Regression Models: The Importance of Graphics.
ERIC Educational Resources Information Center
Dunn, Richard
1989-01-01
Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Computing measures of explained variation for logistic regression models.
Mittlböck, M; Schemper, M
1999-01-01
The proportion of explained variation (R2) is frequently used in the general linear model but in logistic regression no standard definition of R2 exists. We present a SAS macro which calculates two R2-measures based on Pearson and on deviance residuals for logistic regression. Also, adjusted versions for both measures are given, which should prevent the inflation of R2 in small samples. PMID:10195643
Improving phylogenetic regression under complex evolutionary models.
Mazel, Florent; Davies, T Jonathan; Georges, Damien; Lavergne, Sébastien; Thuiller, Wilfried; Peres-NetoO, Pedro R
2016-02-01
Phylogenetic Generalized Least Square (PGLS) is the tool of choice among phylogenetic comparative methods to measure the correlation between species features such as morphological and life-history traits or niche characteristics. In its usual form, it assumes that the residual variation follows a homogenous model of evolution across the branches of the phylogenetic tree. Since a homogenous model of evolution is unlikely to be realistic in nature, we explored the robustness of the phylogenetic regression when this assumption is violated. We did so by simulating a set of traits under various heterogeneous models of evolution, and evaluating the statistical performance (type I error [the percentage of tests based on samples that incorrectly rejected a true null hypothesis] and power [the percentage of tests that correctly rejected a false null hypothesis]) of classical phylogenetic regression. We found that PGLS has good power but unacceptable type I error rates. This finding is important since this method has been increasingly used in comparative analyses over the last decade. To address this issue, we propose a simple solution based on transforming the underlying variance-covariance matrix to adjust for model heterogeneity within PGLS. We suggest that heterogeneous rates of evolution might be particularly prevalent in large phylogenetic trees, while most current approaches assume a homogenous rate of evolution. Our analysis demonstrates that overlooking rate heterogeneity can result in inflated type I errors, thus misleading comparative analyses. We show that it is possible to correct for this bias even when the underlying model of evolution is not known a priori. PMID:27145604
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
NASA Astrophysics Data System (ADS)
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
Climate Change Projections Using Regional Regression Models
NASA Astrophysics Data System (ADS)
Griffis, V. W.; Gyawali, R.; Watkins, D. W.
2012-12-01
A typical approach to project climate change impacts on water resources systems is to downscale general circulation model (GCM) or regional climate model (RCM) outputs as forcing data for a watershed model. With downscaled climate model outputs becoming readily available, multi-model ensemble approaches incorporating mutliple GCMs, multiple emissions scenarios and multiple initializations are increasingly being used. While these multi-model climate ensembles represent a range of plausible futures, different hydrologic models and methods may complicate impact assessment. In particular, associated loss, flow routing, snowmelt and evapotranspiration computation methods can markedly increase hydrological modeling uncertainty. Other challenges include properly calibrating and verifying the watershed model and maintaining a consistent energy budget between climate and hydrologic models. An alternative approach, particularly appealing for ungauged basins or locations where record lengths are short, is to directly predict selected streamflow quantiles from regional regression equations that include physical basin characteristics as well as meteorological variables output by climate models (Fennessey 2011). Two sets of regional regression models are developed for the Great Lakes states using ordinary least squares and weighted least squares regression. The regional regression modeling approach is compared with physically based hydrologic modeling approaches for selected Great Lakes watersheds using downscaled outputs from the Coupled Model Intercomparison Project (CMIP3) as inputs to the Large Basin Runoff Model (LBRM) and the U.S. Army Corps Hydrologic Modeling System (HEC-HMS).
A new bivariate negative binomial regression model
NASA Astrophysics Data System (ADS)
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
A Logistic Regression Model for Personnel Selection.
ERIC Educational Resources Information Center
Raju, Nambury S.; And Others
1991-01-01
A two-parameter logistic regression model for personnel selection is proposed. The model was tested with a database of 84,808 military enlistees. The probability of job success was related directly to trait levels, addressing such topics as selection, validity generalization, employee classification, selection bias, and utility-based fair…
Evaluating Aptness of a Regression Model
ERIC Educational Resources Information Center
Matson, Jack E.; Huguenard, Brian R.
2007-01-01
The data for 104 software projects is used to develop a linear regression model that uses function points (a measure of software project size) to predict development effort. The data set is particularly interesting in that it violates several of the assumptions required of a linear model; but when the data are transformed, the data set satisfies…
A Spline Regression Model for Latent Variables
ERIC Educational Resources Information Center
Harring, Jeffrey R.
2014-01-01
Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…
Heritability Estimation using Regression Models for Correlation
Lee, Hye-Seung; Paik, Myunghee Cho; Rundek, Tatjana; Sacco, Ralph L; Dong, Chuanhui; Krischer, Jeffrey P
2012-01-01
Heritability estimates a polygenic effect on a trait for a population. Reliable interpretation of heritability is critical in planning further genetic studies to locate a gene responsible for the trait. This study accommodates both single and multiple trait cases by employing regression models for correlation parameter to infer the heritability. Sharing the properties of regression approach, the proposed methods are exible to incorporate non-genetic and/or non-additive genetic information in the analysis. The performances of the proposed model are compared with those using the likelihood approach through simulations and carotid Intima Media Thickness analysis from Northern Manhattan family Study. PMID:22457844
Regression models for estimating coseismic landslide displacement
Jibson, R.W.
2007-01-01
Newmark's sliding-block model is widely used to estimate coseismic slope performance. Early efforts to develop simple regression models to estimate Newmark displacement were based on analysis of the small number of strong-motion records then available. The current availability of a much larger set of strong-motion records dictates that these regression equations be updated. Regression equations were generated using data derived from a collection of 2270 strong-motion records from 30 worldwide earthquakes. The regression equations predict Newmark displacement in terms of (1) critical acceleration ratio, (2) critical acceleration ratio and earthquake magnitude, (3) Arias intensity and critical acceleration, and (4) Arias intensity and critical acceleration ratio. These equations are well constrained and fit the data well (71% < R2 < 88%), but they have standard deviations of about 0.5 log units, such that the range defined by the mean ?? one standard deviation spans about an order of magnitude. These regression models, therefore, are not recommended for use in site-specific design, but rather for regional-scale seismic landslide hazard mapping or for rapid preliminary screening of sites. ?? 2007 Elsevier B.V. All rights reserved.
Quality Reporting of Multivariable Regression Models in Observational Studies
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M.
2016-01-01
Abstract Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE. Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model. The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0–30.3) of the articles and 18.5% (95% CI: 14.8–22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor. A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature. PMID:27196467
A new method for dealing with measurement error in explanatory variables of regression models.
Freedman, Laurence S; Fainberg, Vitaly; Kipnis, Victor; Midthune, Douglas; Carroll, Raymond J
2004-03-01
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables. PMID:15032787
A Skew-Normal Mixture Regression Model
ERIC Educational Resources Information Center
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Student Selection and the Special Regression Model.
ERIC Educational Resources Information Center
Deck, Dennis D.
The feasibility of constructing composite scores which will yield pretest measures having all the properties required by the special regression model is explored as an alternative to the single pretest score usually used in student selection for Elementary Secondary Education Act Title I compensatory education programs. Reading data, including…
Bootstrap inference longitudinal semiparametric regression model
NASA Astrophysics Data System (ADS)
Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman
2016-02-01
Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial. PMID:11878222
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification. PMID:26520484
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
General Regression and Representation Model for Classification
Qian, Jianjun; Yang, Jian; Xu, Yong
2014-01-01
Recently, the regularized coding-based classification methods (e.g. SRC and CRC) show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR) for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients) and the specific information (weight matrix of image pixels) to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel) weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR) and robust general regression and representation classifier (R-GRR). The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms. PMID:25531882
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. PMID:26188633
Regression models for expected length of stay.
Grand, Mia Klinten; Putter, Hein
2016-03-30
In multi-state models, the expected length of stay (ELOS) in a state is not a straightforward object to relate to covariates, and the traditional approach has instead been to construct regression models for the transition intensities and calculate ELOS from these. The disadvantage of this approach is that the effect of covariates on the intensities is not easily translated into the effect on ELOS, and it typically relies on the Markov assumption. We propose to use pseudo-observations to construct regression models for ELOS, thereby allowing a direct interpretation of covariate effects while at the same time avoiding the Markov assumption. For this approach, all we need is a non-parametric consistent estimator for ELOS. For every subject (and for every state of interest), a pseudo-observation is constructed, and they are then used as outcome variables in the regression model. We furthermore show how to construct longitudinal (pseudo-) data when combining the concept of pseudo-observations with landmarking. In doing so, covariates are allowed to be time-varying, and we can investigate potential time-varying effects of the covariates. The models can be fitted using generalized estimating equations, and dependence between observations on the same subject is handled by applying the sandwich estimator. The method is illustrated using data from the US Health and Retirement Study where the impact of socio-economic factors on ELOS in health and disability is explored. Finally, we investigate the performance of our approach under different degrees of left-truncation, non-Markovianity, and right-censoring by means of simulation. PMID:26497637
Quantile regression modeling for Malaysian automobile insurance premium data
NASA Astrophysics Data System (ADS)
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Regression models for convex ROC curves.
Lloyd, C J
2000-09-01
The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest. PMID:10985227
Regression Models For Saffron Yields in Iran
NASA Astrophysics Data System (ADS)
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment. PMID:26338096
Ong, Hong Choon; Alih, Ekele
2015-01-01
The tendency for experimental and industrial variables to include a certain proportion of outliers has become a rule rather than an exception. These clusters of outliers, if left undetected, have the capability to distort the mean and the covariance matrix of the Hotelling’s T2 multivariate control charts constructed to monitor individual quality characteristics. The effect of this distortion is that the control chart constructed from it becomes unreliable as it exhibits masking and swamping, a phenomenon in which an out-of-control process is erroneously declared as an in-control process or an in-control process is erroneously declared as out-of-control process. To handle these problems, this article proposes a control chart that is based on cluster-regression adjustment for retrospective monitoring of individual quality characteristics in a multivariate setting. The performance of the proposed method is investigated through Monte Carlo simulation experiments and historical datasets. Results obtained indicate that the proposed method is an improvement over the state-of-art methods in terms of outlier detection as well as keeping masking and swamping rate under control. PMID:25923739
Flexible regression models over river networks
O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark
2014-01-01
Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460
Reconstruction of missing daily streamflow data using dynamic regression models
NASA Astrophysics Data System (ADS)
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
ERIC Educational Resources Information Center
Hedeker, Donald; And Others
1994-01-01
Proposes random-effects regression model for analysis of clustered data. Suggests model assumes some dependency of within-cluster data. Model adjusts effects for resulting dependency from data clustering. Describes maximum marginal likelihood solution. Discusses available statistical software. Demonstrates model via investigation involving…
Analyzing Historical Count Data: Poisson and Negative Binomial Regression Models.
ERIC Educational Resources Information Center
Beck, E. M.; Tolnay, Stewart E.
1995-01-01
Asserts that traditional approaches to multivariate analysis, including standard linear regression techniques, ignore the special character of count data. Explicates three suitable alternatives to standard regression techniques, a simple Poisson regression, a modified Poisson regression, and a negative binomial model. (MJP)
Three-Dimensional Modeling in Linear Regression.
ERIC Educational Resources Information Center
Herman, James D.
Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…
Overpaying morbidity adjusters in risk equalization models.
van Kleef, R C; van Vliet, R C J A; van de Ven, W P M M
2016-09-01
Most competitive social health insurance markets include risk equalization to compensate insurers for predictable variation in healthcare expenses. Empirical literature shows that even the most sophisticated risk equalization models-with advanced morbidity adjusters-substantially undercompensate insurers for selected groups of high-risk individuals. In the presence of premium regulation, these undercompensations confront consumers and insurers with incentives for risk selection. An important reason for the undercompensations is that not all information with predictive value regarding healthcare expenses is appropriate for use as a morbidity adjuster. To reduce incentives for selection regarding specific groups we propose overpaying morbidity adjusters that are already included in the risk equalization model. This paper illustrates the idea of overpaying by merging data on morbidity adjusters and healthcare expenses with health survey information, and derives three preconditions for meaningful application. Given these preconditions, we think overpaying may be particularly useful for pharmacy-based cost groups. PMID:26420555
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Spatial regression models for extreme precipitation in Belgium
NASA Astrophysics Data System (ADS)
van de Vyver, H.
2012-09-01
Quantification of precipitation extremes is important for flood planning purposes, and a common measure of extreme events is the T year return level. Extreme precipitation depths in Belgium are analyzed for accumulation durations ranging from 10 min to 30 days. Spatial generalized extreme value (GEV) models are presented by considering multisite data and relating GEV parameters to geographical/climatological covariates through a common regression relationship. Methods of combining data from several sites are in common use, and in such cases, there is likely to be nonnegligible intersite dependence. However, parameter estimation in GEV models is generally done with the maximum likelihood estimation method (MLE) that assumes independence. Estimates of uncertainty are adjusted for spatial dependence using methodologies proposed earlier. Consistency of GEV distributions for various durations is obtained by fitting a smooth function to the preliminary estimations of the shape parameter. Model quality has been assessed by various statistical tests and indicates the relevance of our approach. In addition, a methodology is applied to account for the fact that measurements have been made in fixed intervals (usually 09:00 UTC-09:00 UTC). The distribution of the annual sliding 24 h maxima was specified through extremal indices of a more than 110 year time series of 24 h aggregated 10 min rainfall and daily rainfall. Finally, the selected models are used for producing maps of precipitation return levels.
Modeling maximum daily temperature using a varying coefficient regression model
NASA Astrophysics Data System (ADS)
Li, Han; Deng, Xinwei; Kim, Dong-Yun; Smith, Eric P.
2014-04-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature. A good predictive model for daily maximum temperature is required because daily maximum temperature is an important measure for predicting survival of temperature sensitive fish. To appropriately model the strong relationship between water and air temperatures at a daily time step, it is important to incorporate information related to the time of the year into the modeling. In this work, a time-varying coefficient model is used to study the relationship between air temperature and water temperature. The time-varying coefficient model enables dynamic modeling of the relationship, and can be used to understand how the air-water temperature relationship varies over time. The proposed model is applied to 10 streams in Maryland, West Virginia, Virginia, North Carolina, and Georgia using daily maximum temperatures. It provides a better fit and better predictions than those produced by a simple linear regression model or a nonlinear logistic model.
Dehesh, Tania; Zare, Najaf; Ayatollahi, Seyyed Mohammad Taghi
2015-01-01
Background. Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach. Methods. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study. Result. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. Conclusion. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients. PMID:26413142
An Importance Sampling EM Algorithm for Latent Regression Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2007-01-01
Reporting methods used in large-scale assessments such as the National Assessment of Educational Progress (NAEP) rely on latent regression models. To fit the latent regression model using the maximum likelihood estimation technique, multivariate integrals must be evaluated. In the computer program MGROUP used by the Educational Testing Service for…
Tolerance bounds for log gamma regression models
NASA Technical Reports Server (NTRS)
Jones, R. A.; Scholz, F. W.; Ossiander, M.; Shorack, G. R.
1985-01-01
The present procedure for finding lower confidence bounds for the quantiles of Weibull populations, on the basis of the solution of a quadratic equation, is more accurate than current Monte Carlo tables and extends to any location-scale family. It is shown that this method is accurate for all members of the log gamma(K) family, where K = 1/2 to infinity, and works well for censored data, while also extending to regression data. An even more accurate procedure involving an approximation to the Lawless (1982) conditional procedure, with numerical integrations whose tables are independent of the data, is also presented. These methods are applied to the case of failure strengths of ceramic specimens from each of three billets of Si3N4, which have undergone flexural strength testing.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Quantile Regression Adjusting for Dependent Censoring from Semi-Competing Risks
Li, Ruosha; Peng, Limin
2014-01-01
Summary In this work, we study quantile regression when the response is an event time subject to potentially dependent censoring. We consider the semi-competing risks setting, where time to censoring remains observable after the occurrence of the event of interest. While such a scenario frequently arises in biomedical studies, most of current quantile regression methods for censored data are not applicable because they generally require the censoring time and the event time be independent. By imposing rather mild assumptions on the association structure between the time-to-event response and the censoring time variable, we propose quantile regression procedures, which allow us to garner a comprehensive view of the covariate effects on the event time outcome as well as to examine the informativeness of censoring. An efficient and stable algorithm is provided for implementing the new method. We establish the asymptotic properties of the resulting estimators including uniform consistency and weak convergence. The theoretical development may serve as a useful template for addressing estimating settings that involve stochastic integrals. Extensive simulation studies suggest that the proposed method performs well with moderate sample sizes. We illustrate the practical utility of our proposals through an application to a bone marrow transplant trial. PMID:25574152
Comparison of Logistic Regression and Linear Regression in Modeling Percentage Data
Zhao, Lihui; Chen, Yuhuan; Schaffner, Donald W.
2001-01-01
Percentage is widely used to describe different results in food microbiology, e.g., probability of microbial growth, percent inactivated, and percent of positive samples. Four sets of percentage data, percent-growth-positive, germination extent, probability for one cell to grow, and maximum fraction of positive tubes, were obtained from our own experiments and the literature. These data were modeled using linear and logistic regression. Five methods were used to compare the goodness of fit of the two models: percentage of predictions closer to observations, range of the differences (predicted value minus observed value), deviation of the model, linear regression between the observed and predicted values, and bias and accuracy factors. Logistic regression was a better predictor of at least 78% of the observations in all four data sets. In all cases, the deviation of logistic models was much smaller. The linear correlation between observations and logistic predictions was always stronger. Validation (accomplished using part of one data set) also demonstrated that the logistic model was more accurate in predicting new data points. Bias and accuracy factors were found to be less informative when evaluating models developed for percentage data, since neither of these indices can compare predictions at zero. Model simplification for the logistic model was demonstrated with one data set. The simplified model was as powerful in making predictions as the full linear model, and it also gave clearer insight in determining the key experimental factors. PMID:11319091
Competing Risk Regression Models for Epidemiologic Data
Cole, Stephen R.; Gange, Stephen J.
2009-01-01
Competing events can preclude the event of interest from occurring in epidemiologic data and can be analyzed by using extensions of survival analysis methods. In this paper, the authors outline 3 regression approaches for estimating 2 key quantities in competing risks analysis: the cause-specific relative hazard (csRH) and the subdistribution relative hazard (sdRH). They compare and contrast the structure of the risk sets and the interpretation of parameters obtained with these methods. They also demonstrate the use of these methods with data from the Women's Interagency HIV Study established in 1993, treating time to initiation of highly active antiretroviral therapy or to clinical disease progression as competing events. In our example, women with an injection drug use history were less likely than those without a history of injection drug use to initiate therapy prior to progression to acquired immunodeficiency syndrome or death by both measures of association (csRH = 0.67, 95% confidence interval: 0.57, 0.80 and sdRH = 0.60, 95% confidence interval: 0.50, 0.71). Moreover, the relative hazards for disease progression prior to treatment were elevated (csRH = 1.71, 95% confidence interval: 1.37, 2.13 and sdRH = 2.01, 95% confidence interval: 1.62, 2.51). Methods for competing risks should be used by epidemiologists, with the choice of method guided by the scientific question. PMID:19494242
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26934999
A Latent Transition Model with Logistic Regression
ERIC Educational Resources Information Center
Chung, Hwan; Walls, Theodore A.; Park, Yousung
2007-01-01
Latent transition models increasingly include covariates that predict prevalence of latent classes at a given time or transition rates among classes over time. In many situations, the covariate of interest may be latent. This paper describes an approach for handling both manifest and latent covariates in a latent transition model. A Bayesian…
Symbolic regression of generative network models
NASA Astrophysics Data System (ADS)
Menezes, Telmo; Roth, Camille
2014-09-01
Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied ``out of the box'' to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world networks. We were able to find programs that are simple enough to lead to an actual understanding of the mechanisms proposed, namely for a simple brain and a social network.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Ksantini, Riadh; Ziou, Djemel; Colin, Bernard; Dubeau, François
2008-02-01
In this paper, we investigate the effectiveness of a Bayesian logistic regression model to compute the weights of a pseudo-metric, in order to improve its discriminatory capacity and thereby increase image retrieval accuracy. In the proposed Bayesian model, the prior knowledge of the observations is incorporated and the posterior distribution is approximated by a tractable Gaussian form using variational transformation and Jensen's inequality, which allow a fast and straightforward computation of the weights. The pseudo-metric makes use of the compressed and quantized versions of wavelet decomposed feature vectors, and in our previous work, the weights were adjusted by classical logistic regression model. A comparative evaluation of the Bayesian and classical logistic regression models is performed for content-based image retrieval as well as for other classification tasks, in a decontextualized evaluation framework. In this same framework, we compare the Bayesian logistic regression model to some relevant state-of-the-art classification algorithms. Experimental results show that the Bayesian logistic regression model outperforms these linear classification algorithms, and is a significantly better tool than the classical logistic regression model to compute the pseudo-metric weights and improve retrieval and classification performance. Finally, we perform a comparison with results obtained by other retrieval methods. PMID:18084057
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Multivariate Regression Models for Estimating Journal Usefulness in Physics.
ERIC Educational Resources Information Center
Bennion, Bruce C.; Karschamroon, Sunee
1984-01-01
This study examines possibility of ranking journals in physics by means of bibliometric regression models that estimate usefulness as it is reported by 167 physicists in United States and Canada. Development of four models, patterns of deviation from models, and validity and application are discussed. Twenty-six references are cited. (EJS)
CONCEPTUAL FRAMEWORK FOR REGRESSION MODELING OF GROUND-WATER FLOW.
Cooley, Richard L.
1985-01-01
The author examines the uses of ground-water flow models and which classes of use require treatment of stochastic components. He then compares traditional and stochastic procedures for modeling actual (as distinguished from hypothetical) systems. Finally, he examines the conceptual basis and characteristics of the regression approach to modeling ground-water flow.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. PMID:23379851
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Joint regression analysis and AMMI model applied to oat improvement
NASA Astrophysics Data System (ADS)
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Optimization of Regression Models of Experimental Data Using Confirmation Points
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2010-01-01
A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.
Default Bayes Factors for Model Selection in Regression
ERIC Educational Resources Information Center
Rouder, Jeffrey N.; Morey, Richard D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Simulation study for model performance of multiresponse semiparametric regression
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Haryatmi, Sri; Budiantara, I. Nyoman
2015-12-01
The objective of this paper is to evaluate the performance of multiresponse semiparametric regression model based on both of the function types and sample sizes. In general, multiresponse semiparametric regression model consists of parametric and nonparametric functions. This paper focuses on both linear and quadratic functions for parametric components and spline function for nonparametric component. Moreover, this model could also be seen as a spline semiparametric seemingly unrelated regression model. Simulation study is conducted by evaluating three combinations of parametric and nonparametric components, i.e. linear-trigonometric, quadratic-exponential, and multiple linear-polynomial functions respectively. Two criterias are used for assessing the model performance, i.e. R-square and Mean Square Error (MSE). The results show that both of the function types and sample sizes have significantly influenced to the model performance. In addition, this multiresponse semiparametric regression model yields the best performance at the small sample size and combination between multiple linear and polynomial functions as parametric and nonparametric components respectively. Moreover, the model performances at the big sample size tend to be similar for any combination of parametric and nonparametric components.
Evaluation of land use regression models in Detroit, Michigan
Introduction: Land use regression (LUR) models have emerged as a cost-effective tool for characterizing exposure in epidemiologic health studies. However, little critical attention has been focused on validation of these models as a step toward temporal and spatial extension of ...
Rethinking the linear regression model for spatial ecological data.
Wagner, Helene H
2013-11-01
The linear regression model, with its numerous extensions including multivariate ordination, is fundamental to quantitative research in many disciplines. However, spatial or temporal structure in the data may invalidate the regression assumption of independent residuals. Spatial structure at any spatial scale can be modeled flexibly based on a set of uncorrelated component patterns (e.g., Moran's eigenvector maps, MEM) that is derived from the spatial relationships between sampling locations as defined in a spatial weight matrix. Spatial filtering thus addresses spatial autocorrelation in the residuals by adding such component patterns (spatial eigenvectors) as predictors to the regression model. However, space is not an ecologically meaningful predictor, and commonly used tests for selecting significant component patterns do not take into account the specific nature of these variables. This paper proposes "spatial component regression" (SCR) as a new way of integrating the linear regression model with Moran's eigenvector maps. In its unconditioned form, SCR decomposes the relationship between response and predictors by component patterns, whereas conditioned SCR provides an alternative method of spatial filtering, taking into account the statistical properties of component patterns in the design of statistical hypothesis tests. Application to the well-known multivariate mite data set illustrates how SCR may be used to condition for significant residual spatial structure and to identify additional predictors associated with residual spatial structure. Finally, I argue that all variance is spatially structured, hence spatial independence is best characterized by a lack of excess variance at any spatial scale, i.e., spatial white noise. PMID:24400490
The Consequences Of Model Misspecification In Regression Analysis.
Deegan, J
1976-04-01
In ordinary least squares regression analysis the desired property of unbiasedness in estimated coefficients is contingent upon the correspondence of the fitted model with the true underlying data generating process. This paper focuses on developing a systematic characterization of the error forms resulting from model misspecification in single equation models. The consequences of model misspecification, for the error forms identified, are also evaluated. PMID:26821674
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *
Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.
2014-01-01
The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes. PMID:24416092
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, R.M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Spatial stochastic regression modelling of urban land use
NASA Astrophysics Data System (ADS)
Arshad, S. H. M.; Jaafar, J.; Abiden, M. Z. Z.; Latif, Z. A.; Rasam, A. R. A.
2014-02-01
Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable.
Modeling urban growth with geographically weighted multinomial logistic regression
NASA Astrophysics Data System (ADS)
Luo, Jun; Kanala, Nagaraj Kapi
2008-10-01
Spatial heterogeneity is usually ignored in previous land use change studies. This paper presents a geographically weighted multinomial logistic regression model for investigating multiple land use conversion in the urban growth process. The proposed model makes estimation at each sample location and generates local coefficients of driving factors for land use conversion. A Gaussian function is used for determine the geographic weights guarantying that all other samples are involved in the calibration of the model for one location. A case study on Springfield metropolitan area is conducted. A set of independent variables are selected as driving factors. A traditional multinomial logistic regression model is set up and compared with the proposed model. Spatial variations of coefficients of independent variables are revealed by investigating the estimations at sample locations.
Modeling energy expenditure in children and adolescents using quantile regression
Technology Transfer Automated Retrieval System (TEKTRAN)
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...
LACIE: Yield-weather regression models for the Canadian prairies
NASA Technical Reports Server (NTRS)
1976-01-01
Most of the variability in wheat production is due to weather fluctuations. Climatic differences within the region account for a large portion of the variability in yields for different parts of the region. Separate regression models were developed for each of the areas indicated.
REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON
This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available. PMID:17335484
A regression model to estimate regional ground water recharge
Lorenz, D.L.; Delin, G.N.
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
A regressive model of isochronism in speech units
NASA Astrophysics Data System (ADS)
Jassem, W.; Krzysko, M.; Stolarski, P.
1981-09-01
To define linguistic isochronism in quantitative terms, a statistical regressive method of analyzing the number of rhythmic units in human speech was employed. The material used was two taped texts spoken in standard British English totaling approximately 2,500 sounds. The sounds were divided into statistically homogeneous classes, and the mean values in each class were utilized in regressive models. Abercrombie's theory of speech rhythm postulating anacrusis and Jassem's theory postulating two types of speech units, anacrusis and a rhythmic unit in the strict sense, were tested using this material.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds. PMID:23796400
Multiobjective optimization for model selection in kernel methods in regression.
You, Di; Benitez-Quiroz, Carlos Fabian; Martinez, Aleix M
2014-10-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-versus-variance tradeoff. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a tradeoff between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition, and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared with methods in the state of the art. PMID:25291740
Multiobjective Optimization for Model Selection in Kernel Methods in Regression
You, Di; Benitez-Quiroz, C. Fabian; Martinez, Aleix M.
2016-01-01
Regression plays a major role in many scientific and engineering problems. The goal of regression is to learn the unknown underlying function from a set of sample vectors with known outcomes. In recent years, kernel methods in regression have facilitated the estimation of nonlinear functions. However, two major (interconnected) problems remain open. The first problem is given by the bias-vs-variance trade-off. If the model used to estimate the underlying function is too flexible (i.e., high model complexity), the variance will be very large. If the model is fixed (i.e., low complexity), the bias will be large. The second problem is to define an approach for selecting the appropriate parameters of the kernel function. To address these two problems, this paper derives a new smoothing kernel criterion, which measures the roughness of the estimated function as a measure of model complexity. Then, we use multiobjective optimization to derive a criterion for selecting the parameters of that kernel. The goal of this criterion is to find a trade-off between the bias and the variance of the learned function. That is, the goal is to increase the model fit while keeping the model complexity in check. We provide extensive experimental evaluations using a variety of problems in machine learning, pattern recognition and computer vision. The results demonstrate that the proposed approach yields smaller estimation errors as compared to methods in the state of the art. PMID:25291740
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
NASA Astrophysics Data System (ADS)
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx
Modeling population density across major US cities: a polycentric spatial regression approach
NASA Astrophysics Data System (ADS)
Griffith, Daniel A.; Wong, David W.
2007-04-01
A common approach to modeling population density gradients across a city is to adjust the specification of a selected set of mathematical functions to achieve the best fit to an urban place’s empirical density values. In this paper, we employ a spatial regression approach that takes into account the spatial autocorrelation latent in urban population density. We also use a Minkowskian distance metric instead of Euclidean or network distance to better describe spatial separation. We apply our formulation to the 20 largest metropolitan areas in the US according to the 2000 census, using block group level data. The general model furnishes good descriptions for both monocentric and polycentric cities.
Procedure for Detecting Outliers in a Circular Regression Model
Rambli, Adzhar; Abuzaid, Ali H. M.; Mohamed, Ibrahim Bin; Hussin, Abdul Ghapor
2016-01-01
A number of circular regression models have been proposed in the literature. In recent years, there is a strong interest shown on the subject of outlier detection in circular regression. An outlier detection procedure can be developed by defining a new statistic in terms of the circular residuals. In this paper, we propose a new measure which transforms the circular residuals into linear measures using a trigonometric function. We then employ the row deletion approach to identify observations that affect the measure the most, a candidate of outlier. The corresponding cut-off points and the performance of the detection procedure when applied on Down and Mardia’s model are studied via simulations. For illustration, we apply the procedure on circadian data. PMID:27064566
Bailit, Jennifer L.; Grobman, William A.; Rice, Madeline Murguia; Spong, Catherine Y.; Wapner, Ronald J.; Varner, Michael W.; Thorp, John M.; Leveno, Kenneth J.; Caritis, Steve N.; Shubert, Phillip J.; Tita, Alan T. N.; Saade, George; Sorokin, Yoram; Rouse, Dwight J.; Blackwell, Sean C.; Tolosa, Jorge E.; Van Dorsten, J. Peter
2014-01-01
Objective Regulatory bodies and insurers evaluate hospital quality using obstetrical outcomes, however meaningful comparisons should take pre-existing patient characteristics into account. Furthermore, if risk-adjusted outcomes are consistent within a hospital, fewer measures and resources would be needed to assess obstetrical quality. Our objective was to establish risk-adjusted models for five obstetric outcomes and assess hospital performance across these outcomes. Study Design A cohort study of 115,502 women and their neonates born in 25 hospitals in the United States between March 2008 and February 2011. Hospitals were ranked according to their unadjusted and risk-adjusted frequency of venous thromboembolism, postpartum hemorrhage, peripartum infection, severe perineal laceration, and a composite neonatal adverse outcome. Correlations between hospital risk-adjusted outcome frequencies were assessed. Results Venous thromboembolism occurred too infrequently (0.03%, 95% CI 0.02% – 0.04%) for meaningful assessment. Other outcomes occurred frequently enough for assessment (postpartum hemorrhage 2.29% (95% CI 2.20–2.38), peripartum infection 5.06% (95% CI 4.93–5.19), severe perineal laceration at spontaneous vaginal delivery 2.16% (95% CI 2.06–2.27), neonatal composite 2.73% (95% CI 2.63–2.84)). Although there was high concordance between unadjusted and adjusted hospital rankings, several individual hospitals had an adjusted rank that was substantially different (as much as 12 rank tiers) than their unadjusted rank. None of the correlations between hospital adjusted outcome frequencies was significant. For example, the hospital with the lowest adjusted frequency of peripartum infection had the highest adjusted frequency of severe perineal laceration. Conclusions Evaluations based on a single risk-adjusted outcome cannot be generalized to overall hospital obstetric performance. PMID:23891630
Direct regression models for longitudinal rates of change
Bryan, Matthew; Heagerty, Patrick J.
2014-01-01
Comparing rates of growth, or rates of change, across covariate-defined subgroups is a primary objective for many longitudinal studies. In the special case of a linear trend over time, the interaction between a covariate and time will characterize differences in longitudinal rates of change. However, in the presence of a non-linear longitudinal trajectory, the standard mean regression approach does not permit parsimonious description or inference regarding differences in rates of change. Therefore, we propose regression methodology for longitudinal data that allows a direct, structured comparison of rates across subgroups even in the presence of a non-linear trend over time. Our basic longitudinal rate regression method assumes a proportional difference across covariate groups in the rate of change across time, but this assumption can be relaxed. Rates are compared relative to a generally specified time trend for which we discuss both parametric and non-parametric estimating approaches. We develop mixed model longitudinal methodology that explicitly characterizes subject-to-subject variation in rates, as well as a marginal estimating equation-based method. In addition, we detail a score test to detect violations of the proportionality assumption, and we allow time-varying rate effects as a natural generalization. Simulation results demonstrate potential gains in power for the longitudinal rate regression model relative to a linear mixed effects model in the presence of a non-linear trend in time. We apply our method to a study of growth among infants born to HIV infected mothers, and conclude with a discussion of possible extensions for our methods. PMID:24497427
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
2015-01-01
Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34–46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R2 = 0.24–0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R2 < 0.17). Locally-calibrated transferred models (R2 = 0.19–0.40) and the Boston-area model (adjusted-R2 = 0.26, range: 0.13–0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data. PMID:25867675
Patton, Allison P; Zamore, Wig; Naumova, Elena N; Levy, Jonathan I; Brugge, Doug; Durant, John L
2015-05-19
Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34-46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R(2) = 0.24-0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R(2) < 0.17). Locally-calibrated transferred models (R(2) = 0.19-0.40) and the Boston-area model (adjusted-R(2) = 0.26, range: 0.13-0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data. PMID:25867675
Aulenbach, Brent T.
2013-01-01
A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.
A Product Partition Model With Regression on Covariates
Müller, Peter; Quintana, Fernando; Rosner, Gary L.
2011-01-01
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient’s covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction. We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates. An implementation of the proposed model as R-package is available for download. PMID:21566678
Regression models for vegetation radar-backscattering and radiometric emission
NASA Technical Reports Server (NTRS)
Eom, H. J.
1986-01-01
Simple regression estimation of radar backscatter and radiometric emission from vegetative terrain is proposed, based on the exact radiative transfer models. A vegetative canopy is modeled as a Rayleigh scattering layer above an irregular Kirchhoff surface. The rms errors between the exact and the estimated ones are found to be less than 5 percent for emission, and 1 dB for the backscattering case, in most practical uses. The proposed formulas are useful in quickly estimating backscattering and emission from the vegetative terrain.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient
Augmented Beta rectangular regression models: A Bayesian perspective.
Wang, Jue; Luo, Sheng
2016-01-01
Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. PMID:26289406
The Dantzig Selector for Censored Linear Regression Models
Li, Yi; Dicker, Lee; Zhao, Sihai Dave
2013-01-01
The Dantzig variable selector has recently emerged as a powerful tool for fitting regularized regression models. To our knowledge, most work involving the Dantzig selector has been performed with fully-observed response variables. This paper proposes a new class of adaptive Dantzig variable selectors for linear regression models when the response variable is subject to right censoring. This is motivated by a clinical study to identify genes predictive of event-free survival in newly diagnosed multiple myeloma patients. Under some mild conditions, we establish the theoretical properties of our procedures, including consistency in model selection (i.e. the right subset model will be identified with a probability tending to 1) and the optimal efficiency of estimation (i.e. the asymptotic distribution of the estimates is the same as that when the true subset model is known a priori). The practical utility of the proposed adaptive Dantzig selectors is verified via extensive simulations. We apply our new methods to the aforementioned myeloma clinical trial and identify important predictive genes. PMID:24478569
Clustering of trend data using joinpoint regression models.
Kim, Hyune-Ju; Luo, Jun; Kim, Jeankyung; Chen, Huann-Sheng; Feuer, Eric J
2014-10-15
In this paper, we propose methods to cluster groups of two-dimensional data whose mean functions are piecewise linear into several clusters with common characteristics such as the same slopes. To fit segmented line regression models with common features for each possible cluster, we use a restricted least squares method. In implementing the restricted least squares method, we estimate the maximum number of segments in each cluster by using both the permutation test method and the Bayes information criterion method and then propose to use the Bayes information criterion to determine the number of clusters. For a more effective implementation of the clustering algorithm, we propose a measure of the minimum distance worth detecting and illustrate its use in two examples. We summarize simulation results to study properties of the proposed methods and also prove the consistency of the cluster grouping estimated with a given number of clusters. The presentation and examples in this paper focus on the segmented line regression model with the ordered values of the independent variable, which has been the model of interest in cancer trend analysis, but the proposed method can be applied to a general model with design points either ordered or unordered. PMID:24895073
Change point testing in logistic regression models with interaction term.
Fong, Youyi; Di, Chongzhi; Permar, Sallie
2015-04-30
A threshold effect takes place in situations where the relationship between an outcome variable and a predictor variable changes as the predictor value crosses a certain threshold/change point. Threshold effects are often plausible in a complex biological system, especially in defining immune responses that are protective against infections such as HIV-1, which motivates the current work. We study two hypothesis testing problems in change point models. We first compare three different approaches to obtaining a p-value for the maximum of scores test in a logistic regression model with change point variable as a main effect. Next, we study the testing problem in a logistic regression model with the change point variable both as a main effect and as part of an interaction term. We propose a test based on the maximum of likelihood ratios test statistic and obtain its reference distribution through a Monte Carlo method. We also propose a maximum of weighted scores test that can be more powerful than the maximum of likelihood ratios test when we know the direction of the interaction effect. In simulation studies, we show that the proposed tests have a correct type I error and higher power than several existing methods. We illustrate the application of change point model-based testing methods in a recent study of immune responses that are associated with the risk of mother to child transmission of HIV-1. PMID:25612253
Cox Regression Models with Functional Covariates for Survival Data
Gellar, Jonathan E.; Colantuoni, Elizabeth; Needham, Dale M.; Crainiceanu, Ciprian M.
2015-01-01
We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome. PMID:26441487
NASA Astrophysics Data System (ADS)
Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou
2013-05-01
A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.
NASA Astrophysics Data System (ADS)
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2015-12-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
NASA Astrophysics Data System (ADS)
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
2011-01-01
Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG) signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2) values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS) was shown to have high isometric torque estimation accuracy combined with very short training times. PMID:21943179
NASA Astrophysics Data System (ADS)
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
Development and Application of Nonlinear Land-Use Regression Models
NASA Astrophysics Data System (ADS)
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed. PMID:20945519
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is
A new inverse regression model applied to radiation biodosimetry
Higueras, Manuel; Puig, Pedro; Ainsbury, Elizabeth A.; Rothkamm, Kai
2015-01-01
Biological dosimetry based on chromosome aberration scoring in peripheral blood lymphocytes enables timely assessment of the ionizing radiation dose absorbed by an individual. Here, new Bayesian-type count data inverse regression methods are introduced for situations where responses are Poisson or two-parameter compound Poisson distributed. Our Poisson models are calculated in a closed form, by means of Hermite and negative binomial (NB) distributions. For compound Poisson responses, complete and simplified models are provided. The simplified models are also expressible in a closed form and involve the use of compound Hermite and compound NB distributions. Three examples of applications are given that demonstrate the usefulness of these methodologies in cytogenetic radiation biodosimetry and in radiotherapy. We provide R and SAS codes which reproduce these examples. PMID:25663804
Diagnostic Measures for the Cox Regression Model with Missing Covariates
Zhu, Hongtu; Ibrahim, Joseph G.; Chen, Ming-Hui
2015-01-01
Summary This paper investigates diagnostic measures for assessing the influence of observations and model misspecification in the presence of missing covariate data for the Cox regression model. Our diagnostics include case-deletion measures, conditional martingale residuals, and score residuals. The Q-distance is proposed to examine the effects of deleting individual observations on the estimates of finite-dimensional and infinite-dimensional parameters. Conditional martingale residuals are used to construct goodness of fit statistics for testing possible misspecification of the model assumptions. A resampling method is developed to approximate the p-values of the goodness of fit statistics. Simulation studies are conducted to evaluate our methods, and a real data set is analyzed to illustrate their use. PMID:26903666
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets. PMID:18304118
Evaluating sediment chemistry and toxicity data using logistic regression modeling
Field, L.J.; MacDonald, D.D.; Norton, S.B.; Severn, C.G.; Ingersoll, C.G.
1999-01-01
This paper describes the use of logistic-regression modeling for evaluating matching sediment chemistry and toxicity data. Contaminant- specific logistic models were used to estimate the percentage of samples expected to be toxic at a given concentration. These models enable users to select the probability of effects of concern corresponding to their specific assessment or management objective or to estimate the probability of observing specific biological effects at any contaminant concentration. The models were developed using a large database (n = 2,524) of matching saltwater sediment chemistry and toxicity data for field-collected samples compiled from a number of different sources and geographic areas. The models for seven chemicals selected as examples showed a wide range in goodness of fit, reflecting high variability in toxicity at low concentrations and limited data on toxicity at higher concentrations for some chemicals. The models for individual test endpoints (e.g., amphipod mortality) provided a better fit to the data than the models based on all endpoints combined. A comparison of the relative sensitivity of two amphipod species to specific contaminants illustrated an important application of the logistic model approach.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Regression models of sprint, vertical jump, and change of direction performance.
Swinton, Paul A; Lloyd, Ray; Keogh, Justin W L; Agouris, Ioannis; Stewart, Arthur D
2014-07-01
It was the aim of the present study to expand on previous correlation analyses that have attempted to identify factors that influence performance of jumping, sprinting, and changing direction. This was achieved by using a regression approach to obtain linear models that combined anthropometric, strength, and other biomechanical variables. Thirty rugby union players participated in the study (age: 24.2 ± 3.9 years; stature: 181.2 ± 6.6 cm; mass: 94.2 ± 11.1 kg). The athletes' ability to sprint, jump, and change direction was assessed using a 30-m sprint, vertical jump, and 505 agility test, respectively. Regression variables were collected during maximum strength tests (1 repetition maximum [1RM] deadlift and squat) and performance of fast velocity resistance exercises (deadlift and jump squat) using submaximum loads (10-70% 1RM). Force, velocity, power, and rate of force development (RFD) values were measured during fast velocity exercises with the greatest values produced across loads selected for further analysis. Anthropometric data, including lengths, widths, and girths were collected using a 3-dimensional body scanner. Potential regression variables were first identified using correlation analyses. Suitable variables were then regressed using a best subsets approach. Three factor models generally provided the most appropriate balance between explained variance and model complexity. Adjusted R values of 0.86, 0.82, and 0.67 were obtained for sprint, jump, and change of direction performance, respectively. Anthropometric measurements did not feature in any of the top models because of their strong association with body mass. For each performance measure, variance was best explained by relative maximum strength. Improvements in models were then obtained by including velocity and power values for jumping and sprinting performance, and by including RFD values for change of direction performance. PMID:24345969
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits
Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-01-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. PMID:27104857
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.
Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-04-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. PMID:27104857
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
NASA Astrophysics Data System (ADS)
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
Sheehan, Kenneth R.; Strager, Michael P.; Welsh, Stuart
2013-01-01
Stream habitat assessments are commonplace in fish management, and often involve nonspatial analysis methods for quantifying or predicting habitat, such as ordinary least squares regression (OLS). Spatial relationships, however, often exist among stream habitat variables. For example, water depth, water velocity, and benthic substrate sizes within streams are often spatially correlated and may exhibit spatial nonstationarity or inconsistency in geographic space. Thus, analysis methods should address spatial relationships within habitat datasets. In this study, OLS and a recently developed method, geographically weighted regression (GWR), were used to model benthic substrate from water depth and water velocity data at two stream sites within the Greater Yellowstone Ecosystem. For data collection, each site was represented by a grid of 0.1 m2 cells, where actual values of water depth, water velocity, and benthic substrate class were measured for each cell. Accuracies of regressed substrate class data by OLS and GWR methods were calculated by comparing maps, parameter estimates, and determination coefficient r 2. For analysis of data from both sites, Akaike’s Information Criterion corrected for sample size indicated the best approximating model for the data resulted from GWR and not from OLS. Adjusted r 2 values also supported GWR as a better approach than OLS for prediction of substrate. This study supports GWR (a spatial analysis approach) over nonspatial OLS methods for prediction of habitat for stream habitat assessments.
A Unified Approach to Power Calculation and Sample Size Determination for Random Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2007-01-01
The underlying statistical models for multiple regression analysis are typically attributed to two types of modeling: fixed and random. The procedures for calculating power and sample size under the fixed regression models are well known. However, the literature on random regression models is limited and has been confined to the case of all…
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models. PMID:19540972
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed. PMID:25412761
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984
Kernel Averaged Predictors for Spatio-Temporal Regression Models.
Heaton, Matthew J; Gelfand, Alan E
2012-12-01
In applications where covariates and responses are observed across space and time, a common goal is to quantify the effect of a change in the covariates on the response while adequately accounting for the spatio-temporal structure of the observations. The most common approach for building such a model is to confine the relationship between a covariate and response variable to a single spatio-temporal location. However, oftentimes the relationship between the response and predictors may extend across space and time. In other words, the response may be affected by levels of predictors in spatio-temporal proximity to the response location. Here, a flexible modeling framework is proposed to capture such spatial and temporal lagged effects between a predictor and a response. Specifically, kernel functions are used to weight a spatio-temporal covariate surface in a regression model for the response. The kernels are assumed to be parametric and non-stationary with the data informing the parameter values of the kernel. The methodology is illustrated on simulated data as well as a physical data set of ozone concentrations to be explained by temperature. PMID:24010051
ERIC Educational Resources Information Center
Thatcher, Greg W.; Henson, Robin K.
This study examined research in training and development to determine effect size reporting practices. It focused on the reporting of corrected effect sizes in research articles using multiple regression analyses. When possible, researchers calculated corrected effect sizes and determine if the associated shrinkage could have impacted researcher…
NASA Astrophysics Data System (ADS)
Zamani, Hossein; Faroughi, Pouya; Ismail, Noriszura
2014-06-01
This study relates the Poisson, mixed Poisson (MP), generalized Poisson (GP) and finite Poisson mixture (FPM) regression models through mean-variance relationship, and suggests the application of these models for overdispersed count data. As an illustration, the regression models are fitted to the US skin care count data. The results indicate that FPM regression model is the best model since it provides the largest log likelihood and the smallest AIC, followed by Poisson-Inverse Gaussion (PIG), GP and negative binomial (NB) regression models. The results also show that NB, PIG and GP regression models provide similar results.
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Ruggeri, Christina; Eng, Kevin H
2014-01-01
Modeling signal transduction in cancer cells has implications for targeting new therapies and inferring the mechanisms that improve or threaten a patient’s treatment response. For transcriptome-wide studies, it has been proposed that simple correlation between a ligand and receptor pair implies a relationship to the disease process. Statistically, a differential correlation (DC) analysis across groups stratified by prognosis can link the pair to clinical outcomes. While the prognostic effect and the apparent change in correlation are both biological consequences of activation of the signaling mechanism, a correlation-driven analysis does not clearly capture this assumption and makes inefficient use of continuous survival phenotypes. To augment the correlation hypothesis, we propose that a regression framework assuming a patient-specific, latent level of signaling activation exists and generates both prognosis and correlation. Data from these systems can be inferred via interaction terms in survival regression models allowing signal transduction models beyond one pair at a time and adjusting for other factors. We illustrate the use of this model on ovarian cancer data from the Cancer Genome Atlas (TCGA) and discuss how the finding may be used to develop markers to guide targeted molecular therapies. PMID:25657571
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
Storm Water Management Model Climate Adjustment Tool (SWMM-CAT)
The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations. SWMM, first released in 1971, models hydrology and hydrauli...
Unified Model for Academic Competence, Social Adjustment, and Psychopathology.
ERIC Educational Resources Information Center
Schaefer, Earl S.; And Others
A unified conceptual model is needed to integrate the extensive research on (1) social competence and adaptive behavior, (2) converging conceptualizations of social adjustment and psychopathology, and (3) emerging concepts and measures of academic competence. To develop such a model, a study was conducted in which teacher ratings were collected on…
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. PMID:25704604
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani
2016-01-01
We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings. PMID:27576054
Spatial distribution of ultrafine particles in urban settings: A land use regression model
NASA Astrophysics Data System (ADS)
Rivera, Marcela; Basagaña, Xavier; Aguilera, Inmaculada; Agis, David; Bouso, Laura; Foraster, Maria; Medina-Ramón, Mercedes; Pey, Jorge; Künzli, Nino; Hoek, Gerard
2012-07-01
BackgroundThe toxic effects of ultrafine particles (UFP) are a public health concern. However, epidemiological studies on the long term effects of UFP are limited due to lacking exposure models. Given the high spatial variation of UFP, the assignment of exposure levels in epidemiological studies requires a fine spatial scale. The aim of this study was to assess the performance of a short-term measurement protocol used at a large number of locations to derive a land use regression (LUR) model of the spatial variation of UFP in Girona, Spain. MethodsWe measured UFP for 15 min on the sidewalk of 644 participants' homes in 12 towns of Girona province (Spain). The measurements were done during non-rush traffic hours 9:15-12:45 and 15:15-16:45 during 32 days between June 15 and July 31, 2009. In parallel, we counted the number of vehicles driving in both directions. Measurements were repeated on a different day for a subset of 25 sites in Girona city. Potential predictor variables such as building density, distance to bus lines and land cover were derived using geographic information systems. We adjusted for temporal variation using daily mean NOx concentrations at a central monitor. Land use regression models for the entire area (Core model) and for individual towns were derived using a supervised forward selection algorithm. ResultsThe best predictors of UFP were traffic intensity, distance to nearest major crossroad, area of high density residential land and household density. The LUR Core model explained 36% of UFP total variation. Adding sampling date and hour of the day to the Core model increased the R2 to 51% without changing the regression slopes. Local models included predictor variables similar to those in the Core model, but performed better with an R2 of 50% in Girona city. Independent LUR models for the first and second measurements at the subset of sites with repetitions had R2's of about 47%. When the mean of the two measurements was used R2 improved to
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M
2016-05-01
Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature. PMID:27196467
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina. PMID:15128931
Using Bibliotherapy to Help Children Adjust to Changing Role Models.
ERIC Educational Resources Information Center
Pardeck, John T.; Pardeck, Jean A.
One technique for helping children adjust to changing role models is bibliotherapy--the use of children's books to facilitate identification with and exploration of sex role behavior. Confronted with change in various social systems, particularly the family, children are faced with conflicts concerning their sex role development. The process…
Catastrophe, Chaos, and Complexity Models and Psychosocial Adjustment to Disability.
ERIC Educational Resources Information Center
Parker, Randall M.; Schaller, James; Hansmann, Sandra
2003-01-01
Rehabilitation professionals may unknowingly rely on stereotypes and specious beliefs when dealing with people with disabilities, despite the formulation of theories that suggest new models of the adjustment process. Suggests that Catastrophe, Chaos, and Complexity Theories hold considerable promise in this regard. This article reviews these…
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables
ERIC Educational Resources Information Center
Vermunt, Jeroen K.
2005-01-01
A well-established approach to modeling clustered data introduces random effects in the model of interest. Mixed-effects logistic regression models can be used to predict discrete outcome variables when observations are correlated. An extension of the mixed-effects logistic regression model is presented in which the dependent variable is a latent…
De Mello, Fernanda; Oliveira, Carlos A L; Ribeiro, Ricardo P; Resende, Emiko K; Povh, Jayme A; Fornari, Darci C; Barreto, Rogério V; McManus, Concepta; Streit, Danilo
2015-01-01
Was evaluated the pattern of growth among females and males of tambaqui by Gompertz nonlinear regression model. Five traits of economic importance were measured on 145 animals during the three years, totaling 981 morphometric data analyzed. Different curves were adjusted between males and females for body weight, height and head length and only one curve was adjusted to the width and body length. The asymptotic weight (a) and relative growth rate to maturity (k) were different between sexes in animals with ± 5 kg; slaughter weight practiced by a specific niche market, very profitable. However, there was no difference between males and females up to ± 2 kg; slaughter weight established to supply the bigger consumer market. Females showed weight greater than males (± 280 g), which are more suitable for fish farming purposes defined for the niche market to larger animals. In general, males had lower maximum growth rate (8.66 g / day) than females (9.34 g / day), however, reached faster than females, 476 and 486 days growth rate, respectively. The height and length body are the traits that contributed most to the weight at 516 days (P <0.001). PMID:26628036
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
ERIC Educational Resources Information Center
Kaplan, David
2005-01-01
This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…
Embedding IRT in Structural Equation Models: A Comparison with Regression Based on IRT Scores
ERIC Educational Resources Information Center
Lu, Irene R. R.; Thomas, D. Roland; Zumbo, Bruno D.
2005-01-01
This article reviews the problems associated with using item response theory (IRT)-based latent variable scores for analytical modeling, discusses the connection between IRT and structural equation modeling (SEM)-based latent regression modeling for discrete data, and compares regression parameter estimates obtained using predicted IRT scores and…
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
ERIC Educational Resources Information Center
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
ERIC Educational Resources Information Center
Pakenham, Kenneth I.; Samios, Christina; Sofronoff, Kate
2005-01-01
The present study examined the applicability of the double ABCX model of family adjustment in explaining maternal adjustment to caring for a child diagnosed with Asperger syndrome. Forty-seven mothers completed questionnaires at a university clinic while their children were participating in an anxiety intervention. The children were aged between…
Asquith, William H.; Roussel, Meghan C.
2009-01-01
Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed
Gurnani, Ashita S.; John, Samantha E.; Gavett, Brandon E.
2015-01-01
The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. PMID:25724515
Spatial Variation and Land Use Regression Modeling of the Oxidative Potential of Fine Particles
Wang, Meng; Eeftens, Marloes; Beelen, Rob; Dons, Evi; Leseman, Daan L.A.C.; Brunekreef, Bert; Cassee, Flemming R.; Janssen, Nicole A.H.; Hoek, Gerard
2015-01-01
Background Oxidative potential (OP) has been suggested to be a more health-relevant metric than particulate matter (PM) mass. Land use regression (LUR) models can estimate long-term exposure to air pollution in epidemiological studies, but few have been developed for OP. Objectives We aimed to characterize the spatial contrasts of two OP methods and to develop and evaluate LUR models to assess long-term exposure to the OP of PM2.5. Methods Three 2-week PM2.5 samples were collected at 10 regional background, 12 urban background, and 18 street sites spread over the Netherlands/Belgium in 1 year and analyzed for OP using electron spin resonance (OPESR) and dithiothreitol (OPDTT). LUR models were developed using temporally adjusted annual averages and a range of land-use and traffic-related GIS variables. Results Street/urban background site ratio was 1.2 for OPDTT and 1.4 for OPESR, whereas regional/urban background ratio was 0.8 for both. OPESR correlated moderately with OPDTT (R2 = 0.35). The LUR models included estimated regional background OP, local traffic, and large-scale urbanity with explained variance (R2) of 0.60 for OPDTT and 0.67 for OPESR. OPDTT and OPESR model predictions were moderately correlated (R2 = 0.44). OP model predictions were moderately to highly correlated with predictions from a previously published PM2.5 model (R2 = 0.37–0.52), and highly correlated with predictions from previously published models of traffic components (R2 > 0.50). Conclusion LUR models explained a large fraction of the spatial variation of the two OP metrics. The moderate correlations among the predictions of OPDTT, OPESR, and PM2.5 models offer the potential to investigate which metric is the strongest predictor of health effects. Citation Yang A, Wang M, Eeftens M, Beelen R, Dons E, Leseman DL, Brunekreef B, Cassee FR, Janssen NA, Hoek G. 2015. Spatial variation and land use regression modeling of the oxidative potential of fine particles. Environ Health Perspect 123
Dynamic errors modeling of CMM based on generalized regression neural network
NASA Astrophysics Data System (ADS)
Zhong, Weihong; Guan, Hongwei; Li, Yingdao; Ma, Xiushui
2010-12-01
The development of modern manufacturing requires a higher speed and accuracy of coordinate measuring machines (CMM). The dynamic error is the main factor affecting the measurement accuracy at high-speed. The dynamic error modeling and estimation are the basis of dynamic error correcting. This paper applies generalize regression neural network (GRNN) to establish and estimate dynamic error model. Compared with BP neural network (BPNN), GRNN has less parameters, only one smoothing factor parameter should to be adjusted. So that it can predict the network faster and with greater computing advantage. The running speed of CMM axis is set through software. Let it running for the X axis motion. The values of the grating and the dual frequency laser interferometer are gained synchronously at the same measure point. The difference between the two values is the real-time dynamic measurement error. The 150 values are collected. The first 100 values of the error sequence are used as training data to establish GRNN model, and the next 50 values are used to test the estimation results. When the smooth factor is set at 0.5, the estimation of GRNN training data is better.The simulation with the experimental data shows that GRNN method obtains better error estimation accuracy and higher computing speed compared with BPNN. GRNN can be applied to dynamic error estimation of CMM under certain conditions.
Dynamic errors modeling of CMM based on generalized regression neural network
NASA Astrophysics Data System (ADS)
Zhong, Weihong; Guan, Hongwei; Li, Yingdao; Ma, Xiushui
2011-05-01
The development of modern manufacturing requires a higher speed and accuracy of coordinate measuring machines (CMM). The dynamic error is the main factor affecting the measurement accuracy at high-speed. The dynamic error modeling and estimation are the basis of dynamic error correcting. This paper applies generalize regression neural network (GRNN) to establish and estimate dynamic error model. Compared with BP neural network (BPNN), GRNN has less parameters, only one smoothing factor parameter should to be adjusted. So that it can predict the network faster and with greater computing advantage. The running speed of CMM axis is set through software. Let it running for the X axis motion. The values of the grating and the dual frequency laser interferometer are gained synchronously at the same measure point. The difference between the two values is the real-time dynamic measurement error. The 150 values are collected. The first 100 values of the error sequence are used as training data to establish GRNN model, and the next 50 values are used to test the estimation results. When the smooth factor is set at 0.5, the estimation of GRNN training data is better.The simulation with the experimental data shows that GRNN method obtains better error estimation accuracy and higher computing speed compared with BPNN. GRNN can be applied to dynamic error estimation of CMM under certain conditions.
A modified GM-estimation for robust fitting of mixture regression models
NASA Astrophysics Data System (ADS)
Booppasiri, Slun; Srisodaphol, Wuttichai
2015-02-01
In the mixture regression models, the regression parameters are estimated by maximum likelihood estimation (MLE) via EM algorithm. Generally, maximum likelihood estimation is sensitive to outliers and heavy tailed error distribution. The robust method, M-estimation can handle outliers existing on dependent variable only for estimating regression coefficients in regression models. Moreover, GM-estimation can handle outliers existing on dependent variable and independent variables. In this study, the modified GM-estimations for estimating the regression coefficients in the mixture regression models are proposed. A Monte Carlo simulation is used to evaluate the efficiency of the proposed methods. The results show that the proposed modified GM-estimations approximate to MLE when there are no outliers and the error is normally distributed. Furthermore, our proposed methods are more efficient than the MLE, when there are leverage points.
NASA Astrophysics Data System (ADS)
Naipal, V.; Reick, C.; Pongratz, J.; Van Oost, K.
2015-03-01
Large uncertainties exist in estimated rates and the extent of soil erosion by surface runoff on a global scale, and this limits our understanding of the global impact that soil erosion might have on agriculture and climate. The Revised Universal Soil Loss Equation (RUSLE) model is due to its simple structure and empirical basis a frequently used tool in estimating average annual soil erosion rates at regional to global scales. However, large spatial scale applications often rely on coarse data input, which is not compatible with the local scale at which the model is parameterized. This study aimed at providing the first steps in improving the global applicability of the RUSLE model in order to derive more accurate global soil erosion rates. We adjusted the topographical and rainfall erosivity factors of the RUSLE model and compared the resulting soil erosion rates to extensive empirical databases on soil erosion from the USA and Europe. Adjusting the topographical factor required scaling of slope according to the fractal method, which resulted in improved topographical detail in a coarse resolution global digital elevation model. Applying the linear multiple regression method to adjust rainfall erosivity for various climate zones resulted in values that are in good comparison with high resolution erosivity data for different regions. However, this method needs to be extended to tropical climates, for which erosivity is biased due to the lack of high resolution erosivity data. After applying the adjusted and the unadjusted versions of the RUSLE model on a global scale we find that the adjusted RUSLE model not only shows a global higher mean soil erosion rate but also more variability in the soil erosion rates. Comparison to empirical datasets of the USA and Europe shows that the adjusted RUSLE model is able to decrease the very high erosion rates in hilly regions that are observed in the unadjusted RUSLE model results. Although there are still some regional
Flexible regression models for ROC and risk analysis, with or without a gold standard.
Branscum, Adam J; Johnson, Wesley O; Hanson, Timothy E; Baron, Andre T
2015-12-30
A novel semiparametric regression model is developed for evaluating the covariate-specific accuracy of a continuous medical test or biomarker. Ideally, studies designed to estimate or compare medical test accuracy will use a separate, flawless gold-standard procedure to determine the true disease status of sampled individuals. We treat this as a special case of the more complicated and increasingly common scenario in which disease status is unknown because a gold-standard procedure does not exist or is too costly or invasive for widespread use. To compensate for missing data on disease status, covariate information is used to discriminate between diseased and healthy units. We thus model the probability of disease as a function of 'disease covariates'. In addition, we model test/biomarker outcome data to depend on 'test covariates', which provides researchers the opportunity to quantify the impact of covariates on the accuracy of a medical test. We further model the distributions of test outcomes using flexible semiparametric classes. An important new theoretical result demonstrating model identifiability under mild conditions is presented. The modeling framework can be used to obtain inferences about covariate-specific test accuracy and the probability of disease based on subject-specific disease and test covariate information. The value of the model is illustrated using multiple simulation studies and data on the age-adjusted ability of soluble epidermal growth factor receptor - a ubiquitous serum protein - to serve as a biomarker of lung cancer in men. SAS code for fitting the model is provided. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26239173
Applying land use regression model to estimate spatial variation of PM₂.₅ in Beijing, China.
Wu, Jiansheng; Li, Jiacheng; Peng, Jian; Li, Weifeng; Xu, Guang; Dong, Chengcheng
2015-05-01
Fine particulate matter (PM2.5) is the major air pollutant in Beijing, posing serious threats to human health. Land use regression (LUR) has been widely used in predicting spatiotemporal variation of ambient air-pollutant concentrations, though restricted to the European and North American context. We aimed to estimate spatiotemporal variations of PM2.5 by building separate LUR models in Beijing. Hourly routine PM2.5 measurements were collected at 35 sites from 4th March 2013 to 5th March 2014. Seventy-seven predictor variables were generated in GIS, including street network, land cover, population density, catering services distribution, bus stop density, intersection density, and others. Eight LUR models were developed on annual, seasonal, peak/non-peak, and incremental concentration subsets. The annual mean concentration across all sites is 90.7 μg/m(3) (SD = 13.7). PM2.5 shows more temporal variation than spatial variation, indicating the necessity of building different models to capture spatiotemporal trends. The adjusted R (2) of these models range between 0.43 and 0.65. Most LUR models are driven by significant predictors including major road length, vegetation, and water land use. Annual outdoor exposure in Beijing is as high as 96.5 μg/m(3). This is among the first LUR studies implemented in a seriously air-polluted Chinese context, which generally produce acceptable results and reliable spatial air-pollution maps. Apart from the models for winter and incremental concentration, LUR models are driven by similar variables, suggesting that the spatial variations of PM2.5 remain steady for most of the time. Temporal variations are explained by the intercepts, and spatial variations in the measurements determine the strength of variable coefficients in our models. PMID:25487555
Development and Evaluation of Land-Use Regression Models Using Modeled Air Quality Concentrations
Abstract Land-use regression (LUR) models have emerged as a preferred methodology for estimating individual exposure to ambient air pollution in epidemiologic studies in absence of subject-specific measurements. Although there is a growing literature focused on LUR evaluation, fu...
Analysis for Regression Model Behavior by Sampling Strategy for Annual Pollutant Load Estimation.
Park, Youn Shik; Engel, Bernie A
2015-11-01
Water quality data are typically collected less frequently than streamflow data due to the cost of collection and analysis, and therefore water quality data may need to be estimated for additional days. Regression models are applicable to interpolate water quality data associated with streamflow data and have come to be extensively used, requiring relatively small amounts of data. There is a need to evaluate how well the regression models represent pollutant loads from intermittent water quality data sets. Both the specific regression model and water quality data frequency are important factors in pollutant load estimation. In this study, nine regression models from the Load Estimator (LOADEST) and one regression model from the Web-based Load Interpolation Tool (LOADIN) were evaluated with subsampled water quality data sets from daily measured water quality data sets for N, P, and sediment. Each water quality parameter had different correlations with streamflow, and the subsampled water quality data sets had various proportions of storm samples. The behaviors of the regression models differed not only by water quality parameter but also by proportion of storm samples. The regression models from LOADEST provided accurate and precise annual sediment and P load estimates using the water quality data of 20 to 40% storm samples. LOADIN provided more accurate and precise annual N load estimates than LOADEST. In addition, the results indicate that avoidance of water quality data extrapolation and availability of water quality data from storm events were crucial in annual pollutant load estimation using pollutant regression models. PMID:26641336
Cooley, R.L.
1983-01-01
Investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. -from Author
NASA Astrophysics Data System (ADS)
Naipal, V.; Reick, C.; Pongratz, J.; Van Oost, K.
2015-09-01
Large uncertainties exist in estimated rates and the extent of soil erosion by surface runoff on a global scale. This limits our understanding of the global impact that soil erosion might have on agriculture and climate. The Revised Universal Soil Loss Equation (RUSLE) model is, due to its simple structure and empirical basis, a frequently used tool in estimating average annual soil erosion rates at regional to global scales. However, large spatial-scale applications often rely on coarse data input, which is not compatible with the local scale on which the model is parameterized. Our study aims at providing the first steps in improving the global applicability of the RUSLE model in order to derive more accurate global soil erosion rates. We adjusted the topographical and rainfall erosivity factors of the RUSLE model and compared the resulting erosion rates to extensive empirical databases from the USA and Europe. By scaling the slope according to the fractal method to adjust the topographical factor, we managed to improve the topographical detail in a coarse resolution global digital elevation model. Applying the linear multiple regression method to adjust rainfall erosivity for various climate zones resulted in values that compared well to high resolution erosivity data for different regions. However, this method needs to be extended to tropical climates, for which erosivity is biased due to the lack of high resolution erosivity data. After applying the adjusted and the unadjusted versions of the RUSLE model on a global scale we find that the adjusted version shows a global higher mean erosion rate and more variability in the erosion rates. Comparison to empirical data sets of the USA and Europe shows that the adjusted RUSLE model is able to decrease the very high erosion rates in hilly regions that are observed in the unadjusted RUSLE model results. Although there are still some regional differences with the empirical databases, the results indicate that the
Beta Regression Finite Mixture Models of Polarization and Priming
ERIC Educational Resources Information Center
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Radman, Andreja; Gredičak, Matija; Kopriva, Ivica; Jerić, Ivanka
2011-01-01
Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met) with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel) support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM) regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample. PMID:22272081
Radman, Andreja; Gredičak, Matija; Kopriva, Ivica; Jerić, Ivanka
2011-01-01
Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met) with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel) support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM) regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample. PMID:22272081
NASA Astrophysics Data System (ADS)
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Box–Cox Transformation and Random Regression Models for Fecal egg Count Data
da Silva, Marcos Vinícius Gualberto Barbosa; Van Tassell, Curtis P.; Sonstegard, Tad S.; Cobuci, Jaime Araujo; Gasbarre, Louis C.
2012-01-01
Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants, fecal egg count (FEC) is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used in an effort to achieve normality before analysis. However, the transformed data are often still not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box–Cox transformation to approach normality and to estimate (co)variance components. We also proposed using random regression models (RRM) for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4) adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box–Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated. PMID:22303406
A spectral graph regression model for learning brain connectivity of Alzheimer's disease.
Hu, Chenhui; Cheng, Lin; Sepulcre, Jorge; Johnson, Keith A; Fakhri, Georges E; Lu, Yue M; Li, Quanzheng
2015-01-01
Understanding network features of brain pathology is essential to reveal underpinnings of neurodegenerative diseases. In this paper, we introduce a novel graph regression model (GRM) for learning structural brain connectivity of Alzheimer's disease (AD) measured by amyloid-β deposits. The proposed GRM regards 11C-labeled Pittsburgh Compound-B (PiB) positron emission tomography (PET) imaging data as smooth signals defined on an unknown graph. This graph is then estimated through an optimization framework, which fits the graph to the data with an adjustable level of uniformity of the connection weights. Under the assumed data model, results based on simulated data illustrate that our approach can accurately reconstruct the underlying network, often with better reconstruction than those obtained by both sample correlation and ℓ1-regularized partial correlation estimation. Evaluations performed upon PiB-PET imaging data of 30 AD and 40 elderly normal control (NC) subjects demonstrate that the connectivity patterns revealed by the GRM are easy to interpret and consistent with known pathology. Moreover, the hubs of the reconstructed networks match the cortical hubs given by functional MRI. The discriminative network features including both global connectivity measurements and degree statistics of specific nodes discovered from the AD and NC amyloid-beta networks provide new potential biomarkers for preclinical and clinical AD. PMID:26024224
A Spectral Graph Regression Model for Learning Brain Connectivity of Alzheimer’s Disease
Hu, Chenhui; Cheng, Lin; Sepulcre, Jorge; Johnson, Keith A.; Fakhri, Georges E.; Lu, Yue M.; Li, Quanzheng
2015-01-01
Understanding network features of brain pathology is essential to reveal underpinnings of neurodegenerative diseases. In this paper, we introduce a novel graph regression model (GRM) for learning structural brain connectivity of Alzheimer's disease (AD) measured by amyloid-β deposits. The proposed GRM regards 11C-labeled Pittsburgh Compound-B (PiB) positron emission tomography (PET) imaging data as smooth signals defined on an unknown graph. This graph is then estimated through an optimization framework, which fits the graph to the data with an adjustable level of uniformity of the connection weights. Under the assumed data model, results based on simulated data illustrate that our approach can accurately reconstruct the underlying network, often with better reconstruction than those obtained by both sample correlation and ℓ1-regularized partial correlation estimation. Evaluations performed upon PiB-PET imaging data of 30 AD and 40 elderly normal control (NC) subjects demonstrate that the connectivity patterns revealed by the GRM are easy to interpret and consistent with known pathology. Moreover, the hubs of the reconstructed networks match the cortical hubs given by functional MRI. The discriminative network features including both global connectivity measurements and degree statistics of specific nodes discovered from the AD and NC amyloid-beta networks provide new potential biomarkers for preclinical and clinical AD. PMID:26024224
Effect of air pollution on lung cancer: A poisson regression model based on vital statistics
Tango, Toshiro
1994-11-01
This article describes a Poisson regression model for time trends of mortality to detect the long-term effects of common levels of air pollution on lung cancer, in which the adjustment for cigarette smoking is not always necessary. The main hypothesis to be tested in the model is that if the long-term and common-level air pollution had an effect on lung cancer, the death rate from lung cancer could be expected to increase gradually at a higher rate in the region with relatively high levels of air pollution than in the region with low levels, and that this trend would not be expected for other control diseases in which cigarette smoking is a risk factor. Using this approach, we analyzed the trend of mortality in females aged 40 to 79, from lung cancer and two control diseases, ischemic heart disease and cerebrovascular disease, based on vital statistics in 23 wards of the Tokyo metropolitan area for 1972 to 1988. Ward-specific mean levels per day of SO{sub 2} and NO{sub 2} from 1974 through 1976 estimated by Makino (1978) were used as the ward-specific exposure measure of air pollution. No data on tobacco consumption in each ward is available. Our analysis supported the existence of long-term effects of air pollution on lung cancer. 14 refs., 5 figs., 2 tabs.
Roesch, Scott C; Vaughn, Allison A; Aldridge, Arianna A; Villodas, Feion
2009-10-01
Many researchers underscore the importance of coping in the daily lives of adolescents, yet very few studies measure this and related constructs at this level. Using a daily diary approach to stress and coping, the current study evaluated a series of mediational coping models in a sample of low-income minority adolescents (N = 89). Specifically, coping was hypothesized to mediate the relationship between attributional style (and dimensions) and daily affect. Using random coefficient regression modeling, the relationship between (a) the locus of causality dimension and positive affect was completely mediated by the use of acceptance and humor as coping strategies; (b) the stability dimension and positive affect was completely mediated by the use of both problem-solving and positive thinking; and (c) the stability dimension and negative affect was partially mediated by the use of religious coping. In addition, the locus of causality and stability (but not globality) dimensions were also directly related to affect. However, the relationship between pessimistic explanatory style and affect was not mediated by coping. Consistent with previous research, these findings suggest that attributions are both directly and indirectly related to indices of affect or adjustment. Thus, attributions may not only influence the type of coping strategy employed, but may also serve as coping strategies themselves. PMID:22029618
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models
Elliott, Michael R.
2012-01-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create “data driven” weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical. PMID:23275683
Şentürk, Damla; Dalrymple, Lorien S.; Mu, Yi; Nguyen, Danh V.
2014-01-01
SUMMARY We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (a) the probability of cardiovascular events, a binary process and (b) the rate of events once the realization is positive - when the ‘hurdle’ is crossed - using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V
2014-11-10
We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting. PMID:24930810
NASA Astrophysics Data System (ADS)
Preobrazhenskii, M. P.; Rudakov, O. B.
2016-01-01
A regression model for calculating the boiling point isobars of tetrachloromethane-organic solvent binary homogeneous systems is proposed. The parameters of the model proposed were calculated for a series of solutions. The correlation between the nonadditivity parameter of the regression model and the hydrophobicity criterion of the organic solvent is established. The parameter value of the proposed model is shown to allow prediction of the potential formation of azeotropic mixtures of solvents with tetrachloromethane.
Determination of airplane model structure from flight data by using modified stepwise regression
NASA Technical Reports Server (NTRS)
Klein, V.; Batterson, J. G.; Murphy, P. C.
1981-01-01
The linear and stepwise regressions are briefly introduced, then the problem of determining airplane model structure is addressed. The MSR was constructed to force a linear model for the aerodynamic coefficient first, then add significant nonlinear terms and delete nonsignificant terms from the model. In addition to the statistical criteria in the stepwise regression, the prediction sum of squares (PRESS) criterion and the analysis of residuals were examined for the selection of an adequate model. The procedure is used in examples with simulated and real flight data. It is shown that the MSR performs better than the ordinary stepwise regression and that the technique can also be applied to the large amplitude maneuvers.
Bertipaglia, T S; Carreño, L O D; Aspilcueta-Borquis, R R; Boligon, A A; Farah, M M; Gomes, F J; Machado, C H C; Rey, F S B; da Fonseca, R
2015-08-01
Random regression models (RRM) and multitrait models (MTM) were used to estimate genetic parameters for growth traits in Brazilian Brahman cattle and to compare the estimated breeding values obtained by these 2 methodologies. For RRM, 78,641 weight records taken between 60 and 550 d of age from 16,204 cattle were analyzed, and for MTM, the analysis consisted of 17,385 weight records taken at the same ages from 12,925 cattle. All models included the fixed effects of contemporary group and the additive genetic, maternal genetic, and animal permanent environmental effects and the quadratic effect of age at calving (AAC) as covariate. For RRM, the AAC was nested in the animal's age class. The best RRM considered cubic polynomials and the residual variance heterogeneity (5 levels). For MTM, the weights were adjusted for standard ages. For RRM, additive heritability estimates ranged from 0.42 to 0.75, and for MTM, the estimates ranged from 0.44 to 0.72 for both models at 60, 120, 205, 365, and 550 d of age. The maximum maternal heritability estimate (0.08) was at 140 d for RRM, but for MTM, it was highest at weaning (0.09). The magnitude of the genetic correlations was generally from moderate to high. The RRM adequately modeled changes in variance or covariance with age, and provided there was sufficient number of samples, increased accuracy in the estimation of the genetic parameters can be expected. Correlation of bull classifications were different in both methods and at all the ages evaluated, especially at high selection intensities, which could affect the response to selection. PMID:26440161
A Noncentral "t" Regression Model for Meta-Analysis
ERIC Educational Resources Information Center
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
Incremental logistic regression for customizing automatic diagnostic models.
Tortajada, Salvador; Robles, Montserrat; García-Gómez, Juan Miguel
2015-01-01
In the last decades, and following the new trends in medicine, statistical learning techniques have been used for developing automatic diagnostic models for aiding the clinical experts throughout the use of Clinical Decision Support Systems. The development of these models requires a large, representative amount of data, which is commonly obtained from one hospital or a group of hospitals after an expensive and time-consuming gathering, preprocess, and validation of cases. After the model development, it has to overcome an external validation that is often carried out in a different hospital or health center. The experience is that the models show underperformed expectations. Furthermore, patient data needs ethical approval and patient consent to send and store data. For these reasons, we introduce an incremental learning algorithm base on the Bayesian inference approach that may allow us to build an initial model with a smaller number of cases and update it incrementally when new data are collected or even perform a new calibration of a model from a different center by using a reduced number of cases. The performance of our algorithm is demonstrated by employing different benchmark datasets and a real brain tumor dataset; and we compare its performance to a previous incremental algorithm and a non-incremental Bayesian model, showing that the algorithm is independent of the data model, iterative, and has a good convergence. PMID:25417079
A Negative Binomial Regression Model for Accuracy Tests
ERIC Educational Resources Information Center
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
KUPPER, Lawrence L.
2012-01-01
A common goal in environmental epidemiologic studies is to undertake logistic regression modeling to associate a continuous measure of exposure with binary disease status, adjusting for covariates. A frequent complication is that exposure may only be measurable indirectly, through a collection of subject-specific variables assumed associated with it. Motivated by a specific study to investigate the association between lung function and exposure to metal working fluids, we focus on a multiplicative-lognormal structural measurement error scenario and approaches to address it when external validation data are available. Conceptually, we emphasize the case in which true untransformed exposure is of interest in modeling disease status, but measurement error is additive on the log scale and thus multiplicative on the raw scale. Methodologically, we favor a pseudo-likelihood (PL) approach that exhibits fewer computational problems than direct full maximum likelihood (ML) yet maintains consistency under the assumed models without necessitating small exposure effects and/or small measurement error assumptions. Such assumptions are required by computationally convenient alternative methods like regression calibration (RC) and ML based on probit approximations. We summarize simulations demonstrating considerable potential for bias in the latter two approaches, while supporting the use of PL across a variety of scenarios. We also provide accessible strategies for obtaining adjusted standard errors to accompany RC and PL estimates. PMID:24027381
Mixture regression models for closed population capture-recapture data.
Tounkara, Fodé; Rivest, Louis-Paul
2015-09-01
In capture-recapture studies, the use of individual covariates has been recommended to get stable population estimates. However, some residual heterogeneity might still exist and ignoring such heterogeneity could lead to underestimating the population size (N). In this work, we explore two new models with capture probabilities depending on both covariates and unobserved random effects, to estimate the size of a population. Inference techniques including Horvitz-Thompson estimate and confidence intervals for the population size, are derived. The selection of a particular model is carried out using the Akaike information criterion (AIC). First, we extend the random effect model of Darroch et al. (1993, Journal of American Statistical Association 88, 1137-1148) to handle unit level covariates and discuss its limitations. The second approach is a generalization of the traditional zero-truncated binomial model that includes a random effect to account for an unobserved heterogeneity. This approach provides useful tools for inference about N, since key quantities such as moments, likelihood functions and estimates of N and their standard errors have closed form expressions. Several models for the unobserved heterogeneity are available and the marginal capture probability is expressed using the Logit and the complementary Log-Log link functions. The sensitivity of the inference to the specification of a model is also investigated through simulations. A numerical example is presented. We compare the performance of the proposed estimator with that obtained under model Mh of Huggins (1989 Biometrika 76, 130-140). PMID:25963047
ERIC Educational Resources Information Center
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
A regressive storm model for extreme space weather
NASA Astrophysics Data System (ADS)
Terkildsen, Michael; Steward, Graham; Neudegg, Dave; Marshall, Richard
2012-07-01
Extreme space weather events, while rare, pose significant risk to society in the form of impacts on critical infrastructure such as power grids, and the disruption of high end technological systems such as satellites and precision navigation and timing systems. There has been an increased focus on modelling the effects of extreme space weather, as well as improving the ability of space weather forecast centres to identify, with sufficient lead time, solar activity with the potential to produce extreme events. This paper describes the development of a data-based model for predicting the occurrence of extreme space weather events from solar observation. The motivation for this work was to develop a tool to assist space weather forecasters in early identification of solar activity conditions with the potential to produce extreme space weather, and with sufficient lead time to notify relevant customer groups. Data-based modelling techniques were used to construct the model, and an extensive archive of solar observation data used to train, optimise and test the model. The optimisation of the base model aimed to eliminate false negatives (missed events) at the expense of a tolerable increase in false positives, under the assumption of an iterative improvement in forecast accuracy during progression of the solar disturbance, as subsequent data becomes available.
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles
Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila
2009-06-28
This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.
Nationwide regression models for predicting urban runoff water quality at unmonitored sites
Tasker, Gary D.; Driver, N.E.
1988-01-01
Regression models are presented that can be used to estimate mean loads for chemical oxygen demand, suspended solids, dissolved solids, total nitrogen, total ammonia plus nitrogen, total phosphorous, dissolved phosphorous, total copper, total lead, and total zinc at unmonitored sites in urban areas. Explanatory variables include drainage area, imperviousness of drainage basin to infiltration, mean annual rainfall, a land-use indicator variable, and mean minimum January temperature. Model parameters are estimated by a generalized-least-squares regression method that accounts for cross correlation and differences in reliability of sample estimates between sites. The regression models account for 20 to 65 percent of the total variation in observed loads.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
NASA Astrophysics Data System (ADS)
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
An interface model for dosage adjustment connects hematotoxicity to pharmacokinetics.
Meille, C; Iliadis, A; Barbolosi, D; Frances, N; Freyer, G
2008-12-01
When modeling is required to describe pharmacokinetics and pharmacodynamics simultaneously, it is difficult to link time-concentration profiles and drug effects. When patients are under chemotherapy, despite the huge amount of blood monitoring numerations, there is a lack of exposure variables to describe hematotoxicity linked with the circulating drug blood levels. We developed an interface model that transforms circulating pharmacokinetic concentrations to adequate exposures, destined to be inputs of the pharmacodynamic process. The model is materialized by a nonlinear differential equation involving three parameters. The relevance of the interface model for dosage adjustment is illustrated by numerous simulations. In particular, the interface model is incorporated into a complex system including pharmacokinetics and neutropenia induced by docetaxel and by cisplatin. Emphasis is placed on the sensitivity of neutropenia with respect to the variations of the drug amount. This complex system including pharmacokinetic, interface, and pharmacodynamic hematotoxicity models is an interesting tool for analysis of hematotoxicity induced by anticancer agents. The model could be a new basis for further improvements aimed at incorporating new experimental features. PMID:19107581
Dreiseitl, Stephan; Harbauer, Alexandra; Binder, Michael; Kittler, Harald
2005-10-01
Logistic regression models are widely used in medicine, but difficult to apply without the aid of electronic devices. In this paper, we present a novel approach to represent logistic regression models as nomograms that can be evaluated by simple line drawings. As a case study, we show how data obtained from a questionnaire-based patient self-assessment study on the risks of developing melanoma can be used to first identify a subset of significant covariates, build a logistic regression model, and finally transform the model to a graphical format. The advantage of the nomogram is that it can easily be mass-produced, distributed and evaluated, while providing the same information as the logistic regression model it represents. PMID:16198997
Aboveground biomass and carbon stocks modelling using non-linear regression model
NASA Astrophysics Data System (ADS)
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
Vatcheva, KP; Lee, M; McCormick, JB; Rahbar, MH
2016-01-01
Objective To demonstrate the adverse impact of ignoring statistical interactions in regression models used in epidemiologic studies. Study design and setting Based on different scenarios that involved known values for coefficient of the interaction term in Cox regression models we generated 1000 samples of size 600 each. The simulated samples and a real life data set from the Cameron County Hispanic Cohort were used to evaluate the effect of ignoring statistical interactions in these models. Results Compared to correctly specified Cox regression models with interaction terms, misspecified models without interaction terms resulted in up to 8.95 fold bias in estimated regression coefficients. Whereas when data were generated from a perfect additive Cox proportional hazards regression model the inclusion of the interaction between the two covariates resulted in only 2% estimated bias in main effect regression coefficients estimates, but did not alter the main findings of no significant interactions. Conclusions When the effects are synergic, the failure to account for an interaction effect could lead to bias and misinterpretation of the results, and in some instances to incorrect policy decisions. Best practices in regression analysis must include identification of interactions, including for analysis of data from epidemiologic studies.
Estimation of mediation effects for zero-inflated regression models.
Wang, Wei; Albert, Jeffrey M
2012-11-20
The goal of mediation analysis is to identify and explicate the mechanism that underlies a relationship between a risk factor and an outcome via an intermediate variable (mediator). In this paper, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate 'extra' zeros in count data. Focusing on the ZI negative binomial models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation framework under a key sequential ignorability assumption. We also consider a novel decomposition of the overall mediation effect for the ZI context using a three-stage mediation model. Estimation of the components of the overall mediation effect requires an assumption involving the joint distribution of two counterfactuals. Simulation study results demonstrate low bias of mediation effect estimators and close-to-nominal coverage probability of confidence intervals. We also modify the mediation formula method by replacing 'exact' integration with a Monte Carlo integration method. The method is applied to a cohort study of dental caries in very low birth weight adolescents. For overall mediation effect estimation, sensitivity analysis was conducted to quantify the degree to which key assumption must be violated to reverse the original conclusion. PMID:22714572
Estimation of Mediation Effects for Zero-inflated Regression Models
Wang, Wei; Albert, Jeffrey M.
2012-01-01
The goal of mediation analysis is to identify and explicate the mechanism that underlies a relationship between a risk factor and an outcome via an intermediate variable (mediator). In this paper, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate `extra' zeros in count data. Focusing on the ZI negative binomial (ZINB) models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation framework under a key sequential ignorability assumption. We also consider a novel decomposition of the overall mediation effect for the ZI context using a three-stage mediation model. Estimation of the components of the overall mediation effect requires an assumption involving the joint distribution of two counterfactuals. Simulation study results demonstrate low bias of mediation effect estimators and close-to-nominal coverage probability (CP) of confidence intervals. We also modify the mediation formula method by replacing `exact' integration with a Monte Carlo integration method. The method is applied to a cohort study of dental caries in very low birth weight adolescents. For overall mediation effect estimation, sensitivity analysis was conducted to quantify the degree to which key assumption must be violated to reverse the original conclusion. PMID:22714572
Bayesian regression model for seasonal forecast of precipitation over Korea
NASA Astrophysics Data System (ADS)
Jo, Seongil; Lim, Yaeji; Lee, Jaeyong; Kang, Hyun-Suk; Oh, Hee-Seok
2012-08-01
In this paper, we apply three different Bayesian methods to the seasonal forecasting of the precipitation in a region around Korea (32.5°N-42.5°N, 122.5°E-132.5°E). We focus on the precipitation of summer season (June-July-August; JJA) for the period of 1979-2007 using the precipitation produced by the Global Data Assimilation and Prediction System (GDAPS) as predictors. Through cross-validation, we demonstrate improvement for seasonal forecast of precipitation in terms of root mean squared error (RMSE) and linear error in probability space score (LEPS). The proposed methods yield RMSE of 1.09 and LEPS of 0.31 between the predicted and observed precipitations, while the prediction using GDAPS output only produces RMSE of 1.20 and LEPS of 0.33 for CPC Merged Analyzed Precipitation (CMAP) data. For station-measured precipitation data, the RMSE and LEPS of the proposed Bayesian methods are 0.53 and 0.29, while GDAPS output is 0.66 and 0.33, respectively. The methods seem to capture the spatial pattern of the observed precipitation. The Bayesian paradigm incorporates the model uncertainty as an integral part of modeling in a natural way. We provide a probabilistic forecast integrating model uncertainty.
Application of wavelet-based multiple linear regression model to rainfall forecasting in Australia
NASA Astrophysics Data System (ADS)
He, X.; Guan, H.; Zhang, X.; Simmons, C.
2013-12-01
In this study, a wavelet-based multiple linear regression model is applied to forecast monthly rainfall in Australia by using monthly historical rainfall data and climate indices as inputs. The wavelet-based model is constructed by incorporating the multi-resolution analysis (MRA) with the discrete wavelet transform and multiple linear regression (MLR) model. The standardized monthly rainfall anomaly and large-scale climate index time series are decomposed using MRA into a certain number of component subseries at different temporal scales. The hierarchical lag relationship between the rainfall anomaly and each potential predictor is identified by cross correlation analysis with a lag time of at least one month at different temporal scales. The components of predictor variables with known lag times are then screened with a stepwise linear regression algorithm to be selectively included into the final forecast model. The MRA-based rainfall forecasting method is examined with 255 stations over Australia, and compared to the traditional multiple linear regression model based on the original time series. The models are trained with data from the 1959-1995 period and then tested in the 1996-2008 period for each station. The performance is compared with observed rainfall values, and evaluated by common statistics of relative absolute error and correlation coefficient. The results show that the wavelet-based regression model provides considerably more accurate monthly rainfall forecasts for all of the selected stations over Australia than the traditional regression model.
Modeling absolute differences in life expectancy with a censored skew-normal regression approach
Clough-Gorr, Kerri; Zwahlen, Marcel
2015-01-01
Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO
A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...
Evaluation of Land use Regression Models for NO2 in El Paso, Texas, USA
Developing suitable exposure estimates for air pollution health studies is problematic due to spatial and temporal variation in concentrations and often limited monitoring data. Though land use regression models (LURs) are often used for this purpose, their applicability to later...
Technology Transfer Automated Retrieval System (TEKTRAN)
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in heat production, or energy expenditure (EE). Multivariate adaptive regression splines (MARS), is a nonparametric method that estimates complex nonlinear relationships by a seri...
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. PMID:25079842
Approaches to retrospective sampling for longitudinal transition regression models
Hunsberger, Sally; Albert, Paul S.; Thoma, Marie
2016-01-01
For binary diseases that relapse and remit, it is often of interest to estimate the effect of covariates on the transition process between disease states over time. The transition process can be characterized by modeling the probability of the binary event given the individual’s history. Designing studies that examine the impact of time varying covariates over time can lead to collection of extensive amounts of data. Sometimes it may be possible to collect and store tissue, blood or images and retrospectively analyze this covariate information. In this paper we consider efficient sampling designs that do not require biomarker measurements on all subjects. We describe appropriate estimation methods for transition probabilities and functions of these probabilities, and evaluate efficiency of the estimates from the proposed sampling designs. These new methods are illustrated with data from a longitudinal study of bacterial vaginosis, a common relapsing-remitting vaginal infection of women of child bearing age.
A Regression Algorithm for Model Reduction of Large-Scale Multi-Dimensional Problems
NASA Astrophysics Data System (ADS)
Rasekh, Ehsan
2011-11-01
Model reduction is an approach for fast and cost-efficient modelling of large-scale systems governed by Ordinary Differential Equations (ODEs). Multi-dimensional model reduction has been suggested for reduction of the linear systems simultaneously with respect to frequency and any other parameter of interest. Multi-dimensional model reduction is also used to reduce the weakly nonlinear systems based on Volterra theory. Multiple dimensions degrade the efficiency of reduction by increasing the size of the projection matrix. In this paper a new methodology is proposed to efficiently build the reduced model based on regression analysis. A numerical example confirms the validity of the proposed regression algorithm for model reduction.
SCI model structure determination program (OSR) user's guide. [optimal subset regression
NASA Technical Reports Server (NTRS)
1979-01-01
The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.
Cooley, R.L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: 1) prior information having known reliability (that is, bias and random error structure), and 2) prior information consisting of best available estimates of unknown reliability. It is shown that if both scales of prior information are available, then a combined regression analysis may be made. -from Author
Diepgen, T L; Blettner, M
1996-05-01
In order to determine the relative importance of genetics and the environment on the occurrence of atopic diseases, we investigated the familial aggregation of atopic eczema, allergic rhinitis, and allergic asthma in the relatives of 426 patients with atopic eczema and 628 subjects with no history of eczema (5,136 family members in total). Analyses were performed by regression models for odds ratios (OR) allowing us to estimate OR for the familial aggregation and simultaneously to adjust for other covariates. Three models were analyzed assuming that the OR i) is the same among any two members of a family, ii) depends on different familial constellations, i.e., whether the pairs are siblings, parents, or parent/sibling pairs, and iii) is not the same between the father and the children and between the mother and the children. The OR of familial aggregation for atopic eczema was 2.16 (95% confidence interval (95%-CI) 1.58-2.96) if no distinction was made between the degree of relationship. Further analyses within the members of the family showed a high OR among siblings (OR = 3.86; 95%-CI 2.10-7.09), while the OR between parents and siblings was only 1.90 (95%-CI 1.31-2.97). Only for atopic eczema was the familial aggregation between fathers and siblings (ms: OR = 2.66; fs: OR = 1.29). This can be explained by stronger maternal heritability, shared physical environment of mother and child, or environmental events that affect the fetus in utero. Since for all atopic diseases a stronger correlation was found between siblings than between siblings and parents, our study indicates that environmental factors, especially during childhood, seem to explain the recently observed increased frequencies of atopic diseases. PMID:8618061
Regression models for estimating urban storm-runoff quality and quantity in the United States
Driver, N.E.; Troutman, B.M.
1989-01-01
Urban planners and managers need information about the local quantity of precipitation and the quality and quantity of storm runoff if they are to plan adequately for the effects of storm runoff from urban areas. As a result of this need, linear regression models were developed for the estimation of storm-runoff loads and volumes from physical, land-use, and climatic characteristics of urban watersheds throughout the United States. Three statistically different regions were delineated, based on mean annual rainfall, to improve linear regression models. One use of these models is to estimate storm-runoff loads and volumes at gaged and ungaged urban watersheds. The most significant explanatory variables in all linear regression models were total storm rainfall and total contributing drainage area. Impervious area, land-use, and mean annual climatic characteristics were also significant explanatory variables in some linear regression models. Models for dissolved solids, total nitrogen, and total ammonia plus organic nitrogen as nitrogen were the most accurate models for most areas, whereas models for suspended solids were the least accurate. The most accurate models were those for more arid western United States, and the least accurate models were those for areas that had large quantities of mean annual rainfall.Linear regression models were developed for the estimation of storm-runoff loads and volumes from physical, land-use, and climatic characteristics of urban watersheds throughout the United States. Three statistically different regions were delineated, based on mean annual rainfall, to improve linear regression models. One use of these models is to estimate storm-runoff loads and volumes at gaged and ungaged urban watersheds. The most significant explanatory variables in all linear regression models were total storm rainfall and total contributing drainage area. Impervious area, land-use, and mean annual climatic characteristics were also significant
Demenais, F M; Laing, A E; Bonney, G E
1992-01-01
Segregation analysis of discrete traits can be conducted by the classical mixed model and the recently introduced regressive models. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affected persons have a liability exceeding a threshold. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype effects, the phenotypes of older relatives, and other covariates. A formulation of the regressive models, based on an underlying liability model, has been recently proposed. The regression coefficients on antecedents are expressed in terms of the relevant familial correlations and a one-to-one correspondence with the parameters of the mixed model can thus be established. Computer simulations are conducted to evaluate the fit of the two formulations of the regressive models to the mixed model on nuclear families. The two forms of the class D regressive model provide a good fit to a generated mixed model, in terms of both hypothesis testing and parameter estimation. The simpler class A regressive model, which assumes that the outcomes of children depend solely on the outcomes of parents, is not robust against a sib-sib correlation exceeding that specified by the model, emphasizing testing class A against class D. The studies reported here show that if the true state of nature is that described by the mixed model, then a regressive model will do just as well. Moreover, the regressive models, allowing for more patterns of family dependence, provide a flexible framework to understand gene-environment interactions in complex diseases. PMID:1487139
Sternberg, Maya R; Schleicher, Rosemary L; Pfeiffer, Christine M
2013-06-01
The collection of articles in this supplement issue provides insight into the association of various covariates with concentrations of biochemical indicators of diet and nutrition (biomarkers), beyond age, race, and sex, using linear regression. We studied 10 specific sociodemographic and lifestyle covariates in combination with 29 biomarkers from NHANES 2003-2006 for persons aged ≥ 20 y. The covariates were organized into 2 sets or "chunks": sociodemographic (age, sex, race-ethnicity, education, and income) and lifestyle (dietary supplement use, smoking, alcohol consumption, BMI, and physical activity) and fit in hierarchical fashion by using each category or set of related variables to determine how covariates, jointly, are related to biomarker concentrations. In contrast to many regression modeling applications, all variables were retained in a full regression model regardless of significance to preserve the interpretation of the statistical properties of β coefficients, P values, and CIs and to keep the interpretation consistent across a set of biomarkers. The variables were preselected before data analysis, and the data analysis plan was designed at the outset to minimize the reporting of false-positive findings by limiting the amount of preliminary hypothesis testing. Although we generally found that demographic differences seen in biomarkers were over- or underestimated when ignoring other key covariates, the demographic differences generally remained significant after adjusting for sociodemographic and lifestyle variables. These articles are intended to provide a foundation to researchers to help them generate hypotheses for future studies or data analyses and/or develop predictive regression models using the wealth of NHANES data. PMID:23596165
Disaster Hits Home: A Model of Displaced Family Adjustment after Hurricane Katrina
ERIC Educational Resources Information Center
Peek, Lori; Morrissey, Bridget; Marlatt, Holly
2011-01-01
The authors explored individual and family adjustment processes among parents (n = 30) and children (n = 55) who were displaced to Colorado after Hurricane Katrina. Drawing on in-depth interviews with 23 families, this article offers an inductive model of displaced family adjustment. Four stages of family adjustment are presented in the model: (a)…
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
ERIC Educational Resources Information Center
Tay, Louis; Drasgow, Fritz
2012-01-01
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Aguilera, Inmaculada; Foraster, Maria; Basagaña, Xavier; Corradi, Elisabetta; Deltell, Alexandre; Morelli, Xavier; Phuleria, Harish C; Ragettli, Martina S; Rivera, Marcela; Thomasson, Alexandre; Slama, Rémy; Künzli, Nino
2015-01-01
Noise prediction models and noise maps are used to estimate the exposure to road traffic noise, but their availability and the quality of the noise estimates is sometimes limited. This paper explores the application of land use regression (LUR) modelling to assess the long-term intraurban spatial variability of road traffic noise in three European cities. Short-term measurements of road traffic noise taken in Basel, Switzerland (n=60), Girona, Spain (n=40), and Grenoble, France (n=41), were used to develop two LUR models: (a) a "GIS-only" model, which considered only predictor variables derived with Geographic Information Systems; and (b) a "Best" model, which in addition considered the variables collected while visiting the measurement sites. Both noise measurements and noise estimates from LUR models were compared with noise estimates from standard noise models developed for each city by the local authorities. Model performance (adjusted R(2)) was 0.66-0.87 for "GIS-only" models, and 0.70-0.89 for "Best" models. Short-term noise measurements showed a high correlation (r=0.62-0.78) with noise estimates from the standard noise models. LUR noise estimates did not show any systematic differences in the spatial patterns when compared with those from standard noise models. LUR modelling with accurate GIS source data can be a promising tool for noise exposure assessment with applications in epidemiological studies. PMID:25227731
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
NASA Astrophysics Data System (ADS)
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert M.
2013-01-01
A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.
New class of Johnson SB distributions and its associated regression model for rates and proportions.
Lemonte, Artur J; Bazán, Jorge L
2016-07-01
By starting from the Johnson SB distribution pioneered by Johnson (), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model. PMID:26659998
Regression models based on new local strategies for near infrared spectroscopic data.
Allegrini, F; Fernández Pierna, J A; Fragoso, W D; Olivieri, A C; Baeten, V; Dardenne, P
2016-08-24
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases. PMID:27496996
NASA Astrophysics Data System (ADS)
Alih, Ekele; Ong, Hong Choon
2014-07-01
The application of Ordinary Least Squares (OLS) to a single equation assumes among others, that the predictor variables are truly exogenous; that there is only one-way causation between the dependent variable yi and the predictor variables xij. If this is not true and the xij 'S are at the same time determined by yi, the OLS assumption will be violated and a single equation method will give biased and inconsistent parameter estimates. The OLS also suffers a huge set back in the presence of contaminated data. In order to rectify these problems, simultaneous equation models have been introduced as well as robust regression. In this paper, we construct a simultaneous equation model with variables that exhibit simultaneous dependence and we proposed a robust multivariate regression procedure for estimating the parameters of such models. The performance of the robust multivariate regression procedure was examined and compared with the OLS multivariate regression technique and the Three-Stage Least squares procedure (3SLS) using numerical simulation experiment. The performance of the robust multivariate regression and (3SLS) were approximately equally better than OLS when there is no contamination in the data. Nevertheless, when contaminations occur in the data, the robust multivariate regression outperformed the 3SLS and OLS.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Prediction of lip response to orthodontic treatment using a multivariable regression model
Shirvani, Amin; Sadeghian, Saeid; Abbasi, Safieh
2016-01-01
Background: This was a retrospective cephalometric study to develop a more precise estimation of soft tissue changes related to underlying tooth movment than simple relatioship betweenhard and soft tissues. Materials and Methods: The lateral cephalograms of 61 adult patients undergoing orthodontic treatment (31 = premolar extraction, 31 = nonextraction) were obtained, scanned and digitized before and immediately after the end of treatment. Hard and soft tissues, angular and linear measures were calculated by Viewbox 4.0 software. The changes of the values were analyzed using paired t-test. The accuracy of predictions of soft tissue changes were compared with two methods: (1) Use of ratios of the means of soft tissue to hard tissue changes (Viewbox 4.0 Software), (2) use of stepwise multivariable regression analysis to create prediction equations for soft tissue changes at superior labial sulcus, labrale superius, stomion superius, inferior labial sulcus, labrale inferius, stomion inferius (all on a horizontal plane). Results: Stepwise multiple regressions to predict lip movements showed strong relations for the upper lip (adjusted R2 = 0.92) and the lower lip (adjusted R2 = 0.91) in the extraction group. Regression analysis showed slightly weaker relations in the nonextraction group. Conclusion: Within the limitation of this study, multiple regression technique was slightly more accurate than the ratio of mean prediction (Viewbox4.0 software) and appears to be useful in the prediction of soft tissue changes. As the variability of the predicted individual outcome seems to be relatively high, caution should be taken in predicting hard and soft tissue positional changes. PMID:26962314
Preisser, J. S.; Phillips, C.; Perin, J.; Schwartz, T. A.
2011-01-01
Objectives The article reviews proportional and partial proportional odds regression for ordered categorical outcomes, such as patient-reported measures, that are frequently used in clinical research in dentistry. Methods The proportional odds regression model for ordinal data is a generalization of ordinary logistic regression for dichotomous responses. When the proportional odds assumption holds for some but not all of the covariates, the lesser known partial proportional odds model is shown to provide a useful extension. Results The ordinal data models are illustrated for the analysis of repeated ordinal outcomes to determine whether the burden associated with sensory alteration following a bilateral sagittal split osteotomy procedure differed for those patients who were given opening exercises only following surgery and those who received sensory retraining exercises in conjunction with standard opening exercises. Conclusions Proportional and partial proportional odds models are broadly applicable to the analysis of cross-sectional and longitudinal ordinal data in dental research. PMID:21070317
NASA Astrophysics Data System (ADS)
Pradhan, B.; Buchroithner, M. F.; Mansor, S.
2009-04-01
This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.
Regression model estimation of early season crop proportions: North Dakota, some preliminary results
NASA Technical Reports Server (NTRS)
Lin, K. K. (Principal Investigator)
1982-01-01
To estimate crop proportions early in the season, an approach is proposed based on: use of a regression-based prediction equation to obtain an a priori estimate for specific major crop groups; modification of this estimate using current-year LANDSAT and weather data; and a breakdown of the major crop groups into specific crops by regression models. Results from the development and evaluation of appropriate regression models for the first portion of the proposed approach are presented. The results show that the model predicts 1980 crop proportions very well at both county and crop reporting district levels. In terms of planted acreage, the model underpredicted 9.1 percent of the 1980 published data on planted acreage at the county level. It predicted almost exactly the 1980 published data on planted acreage at the crop reporting district level and overpredicted the planted acreage by just 0.92 percent.
Alley, William M.
1986-01-01
Problems involving the combined use of contaminant transport models and nonlinear optimization schemes can be very expensive to solve. This paper explores the use of transport models with ordinary regression and regression on ranks to develop approximate response functions of concentrations at critical locations as a function of pumping and recharge at decision wells. These response functions combined with other constraints can often be solved very easily and may suggest reasonable starting points for combined simulation-management modeling or even relatively efficient operating schemes in themselves.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
Bazzazian, S; Besharat, M A
2012-01-01
The aim of this study was to develop and test a model of adjustment to type I diabetes. Three hundred young adults (172 females and 128 males) with type I diabetes were asked to complete the Adult Attachment Inventory (AAI), the Brief Illness Perception Questionnaire (Brief IPQ), Task-oriented subscale of the Coping Inventory for Stressful Situations (CISS), D-39, and well-being subscale of the Mental Health Inventory (MHI). HbA1c was obtained from laboratory examination. Results from structural equation analysis partly supported the hypothesized model. Secure and avoidant attachment styles were found to have effects on illness perception, ambivalent attachment style did not have significant effect on illness perception. Three attachment styles had significant effect on task-oriented coping strategy. Avoidant attachment had negative direct effect on adjustment too. Regression effects of illness perception and task-oriented coping strategy on adjustment were positive. Therefore, positive illness perception and more usage of task-oriented coping strategy predict better adjustment to diabetes. So, the results confirmed the theoretical bases and empirical evidence of effectiveness of attachment styles in adjustment to chronic disease and can be helpful in devising preventive policies, determining high-risk maladjusted patients, and planning special psychological treatment. PMID:21678193
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations. PMID:19616918
Least squares estimation of Generalized Space Time AutoRegressive (GSTAR) model and its properties
NASA Astrophysics Data System (ADS)
Ruchjana, Budi Nurani; Borovkova, Svetlana A.; Lopuhaa, H. P.
2012-05-01
In this paper we studied a least squares estimation parameters of the Generalized Space Time AutoRegressive (GSTAR) model and its properties, especially in consistency and asymptotic normality. We use R software to estimate the GSTAR parameter and apply the model toward real phenomena of data, such as an oil production data at volcanic layer.
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Decision Models for Admission into Teacher Preparation: An Application of Regression Analysis.
ERIC Educational Resources Information Center
Garcia, Ricardo A.; Denton, Jon J.
This investigation explores the feasibility of constructing regression models for predicting teaching success and classroom behavior utilizing various measures of attitudes, personality, and psychological factors as predictors. The purpose of the study is to devise decision models that predict a candidate's success in student teaching as…
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational…
Some Classroom Experiences in the Teaching of Empirical Model Building and Regression Analysis.
ERIC Educational Resources Information Center
Utter, Merlin; Wilkinson, John W.
The use of the digital computer for the presentation of the topics of empirical model building and regression analysis is discussed. The author concentrates upon a description of computing exercises which are employed to provide the students with experience in model building and evaluation in a controlled situation. The types of exercises given…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
The role of a murine transplantation model of atherosclerosis regression in drug discovery
Feig, Jonathan E; Quick, John S
2015-01-01
Atherosclerosis is the leading cause of death worldwide. To date, the use of statins to lower LDL levels has been the major intervention used to delay or halt disease progression. These drugs have an incomplete impact on plaque burden and risk, however, as evidenced by the substantial rates of myocardial infarctions that occur in large-scale clinical trials of statins. Thus, it is hoped that by understanding the factors that lead to plaque regression, better approaches to treating atherosclerosis may be developed. A transplantation-based mouse model of atherosclerosis regression has been developed by allowing plaques to form in a model of human atherosclerosis, the apoE-deficient mouse, and then placing these plaques into recipient mice with a normolipidemic plasma environment. Under these conditions, the depletion of foam cells occurs. Interestingly, the disappearance of foam cells was primarily due to migration in a CCR7-dependent manner to regional and systemic lymph nodes after 3 days in the normolipidemic (regression) environment. Further studies using this transplant model demonstrated that liver X receptor and HDL are other factors likely to be involved in plaque regression. In conclusion, through the use of this transplant model, the process of uncovering the pathways regulating atherosclerosis regression has begun, which will ultimately lead to the identification of new therapeutic targets. PMID:19333880
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Nagai, Mika; Konno, Yoshihiro; Satsukawa, Masahiro; Yamashita, Shinji; Yoshinari, Kouichi
2016-08-01
Drug-drug interactions (DDIs) via cytochrome P450 (P450) induction are one clinical problem leading to increased risk of adverse effects and the need for dosage adjustments and additional therapeutic monitoring. In silico models for predicting P450 induction are useful for avoiding DDI risk. In this study, we have established regression models for CYP3A4 and CYP2B6 induction in human hepatocytes using several physicochemical parameters for a set of azole compounds with different P450 induction as characteristics as model compounds. To obtain a well-correlated regression model, the compounds for CYP3A4 or CYP2B6 induction were independently selected from the tested azole compounds using principal component analysis with fold-induction data. Both of the multiple linear regression models obtained for CYP3A4 and CYP2B6 induction are represented by different sets of physicochemical parameters. The adjusted coefficients of determination for these models were of 0.8 and 0.9, respectively. The fold-induction of the validation compounds, another set of 12 azole-containing compounds, were predicted within twofold limits for both CYP3A4 and CYP2B6. The concordance for the prediction of CYP3A4 induction was 87% with another validation set, 23 marketed drugs. However, the prediction of CYP2B6 induction tended to be overestimated for these marketed drugs. The regression models show that lipophilicity mostly contributes to CYP3A4 induction, whereas not only the lipophilicity but also the molecular polarity is important for CYP2B6 induction. Our regression models, especially that for CYP3A4 induction, might provide useful methods to avoid potent CYP3A4 or CYP2B6 inducers during the lead optimization stage without performing induction assays in human hepatocytes. PMID:27208383
Effects of model sensitivity and nonlinearity on nonlinear regression of ground water flow
Yager, R.M.
2004-01-01
Nonlinear regression is increasingly applied to the calibration of hydrologic models through the use of perturbation methods to compute the Jacobian or sensitivity matrix required by the Gauss-Newton optimization method. Sensitivities obtained by perturbation methods can be less accurate than those obtained by direct differentiation, however, and concern has arisen that the optimal parameter values and the associated parameter covariance matrix computed by perturbation could also be less accurate. Sensitivities computed by both perturbation and direct differentiation were applied in nonlinear regression calibration of seven ground water flow models. The two methods gave virtually identical optimum parameter values and covariances for the three models that were relatively linear and two of the models that were relatively nonlinear, but gave widely differing results for two other nonlinear models. The perturbation method performed better than direct differentiation in some regressions with the nonlinear models, apparently because approximate sensitivities computed for an interval yielded better search directions than did more accurately computed sensitivities for a point. The method selected to avoid overshooting minima on the error surface when updating parameter values with the Gauss-Newton procedure appears for nonlinear models to be more important than the method of sensitivity calculation in controlling regression convergence.
Larson, S.J.; Gilliom, R.J.
2001-01-01
Regression models were developed for estimating stream concentrations of the herbicides alachlor, atrazine, cyanazine, metolachlor, and trifluralin from use-intensity data and watershed characteristics. Concentrations were determined from samples collected from 45 streams throughout the United States during 1993 to 1995 as part of the U.S. Geological Survey's National Water-Quality Assessment (NAWQA). Separate regression models were developed for each of six percentiles (10th, 25th, 50th, 75th, 90th, 95th) of the annual distribution of stream concentrations and for the annual time-weighted mean concentration. Estimates for the individual percentiles can be combined to provide an estimate of the annual distribution of concentrations for a given stream. Agricultural use of the herbicide in the watershed was a significant predictor in nearly all of the models. Several hydrologic and soil parameters also were useful in explaining the variability in concentrations of herbicides among the streams. Most of the regression models developed for estimation of concentration percentiles and annual mean concentrations accounted for 50 percent to 90 percent of the variability among streams. Predicted concentrations were nearly always within an order of magnitude of the measured concentrations for the model-development streams, and predicted concentration distributions reasonably matched the actual distributions in most cases. Results from application of the models to streams not included in the model development data set are encouraging, but further validation of the regression approach described in this paper is needed.
Mirzaei, H R; Pitchford, W S; Verbyla, A P
2011-01-01
Two analyses, cubic and piecewise random regression, were conducted to model growth of crossbred cattle from birth to about two years of age, investigating the ability of a piecewise procedure to fit growth traits without the complications of the cubic model. During a four-year period (1994-1997) of the Australian "Southern Crossbreeding Project", mature Hereford cows (N = 581) were mated to 97 sires of Angus, Belgian Blue, Hereford, Jersey, Limousin, South Devon, and Wagyu breeds, resulting in 1141 steers and heifers born over four years. Data included 13 (for steers) and eight (for heifers) live body weight measurements, made approximately every 50 days from birth until slaughter. The mixed model included fixed effects of sex, sire breed, age (linear, quadratic and cubic), and their interactions between sex and sire breed with age. Random effects were sire, dam, management (birth location, year, post-weaning groups), and permanent environmental effects and for each of these when possible, their interactions with linear, quadratic and cubic growth. In both models, body weights of all breeds increased over pre-weaning period, held fairly steady (slightly flattening) over the dry season then increased again towards the end of the feedlot period. The number of estimated parameters for the cubic model was 22 while for the piecewise model it was 32. It was concluded that the piecewise model was very similar to the cubic model in the fit to the data; with the piecewise model being marginally better. The piecewise model seems to fit the data better at the end of the growth period. PMID:21968730
Thomas, Michael S C; Knowland, Victoria C P; Karmiloff-Smith, Annette
2011-10-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by overaggressive synaptic pruning and identifying the mechanisms involved. We used a novel population-modeling technique to investigate developmental deficits, in which both neurocomputational parameters and the learning environment were varied across a large number of simulated individuals. Regression was generated by the atypical setting of a single pruning-related parameter. We observed a probabilistic relationship between the atypical pruning parameter and the presence of regression, as well as variability in the onset, severity, behavioral specificity, and recovery from regression. Other neurocomputational parameters that varied across the population modulated the risk that an individual would show regression. We considered a further hypothesis that behavioral regression may index an underlying anomaly characterizing the broader autism phenotype. If this is the case, we show how the model also accounts for several additional findings: shared gene variants between autism and language impairment (Vernes et al., 2008); larger brain size in autism but only in early development (Redcay & Courchesne, 2005); and the possibility of quasi-autism, caused by extreme environmental deprivation (Rutter et al., 1999). We make a novel prediction that the earliest developmental symptoms in the emergence of autism should be sensory and motor rather than social and review empirical data offering preliminary support for this prediction. PMID:21875243
CANFIS: A non-linear regression procedure to produce statistical air-quality forecast models
Burrows, W.R.; Montpetit, J.; Pudykiewicz, J.
1997-12-31
Statistical models for forecasts of environmental variables can provide a good trade-off between significance and precision in return for substantial saving of computer execution time. Recent non-linear regression techniques give significantly increased accuracy compared to traditional linear regression methods. Two are Classification and Regression Trees (CART) and the Neuro-Fuzzy Inference System (NFIS). Both can model predict and distributions, including the tails, with much better accuracy than linear regression. Given a learning data set of matched predict and predictors, CART regression produces a non-linear, tree-based, piecewise-continuous model of the predict and data. Its variance-minimizing procedure optimizes the task of predictor selection, often greatly reducing initial data dimensionality. NFIS reduces dimensionality by a procedure known as subtractive clustering but it does not of itself eliminate predictors. Over-lapping coverage in predictor space is enhanced by NFIS with a Gaussian membership function for each cluster component. Coefficients for a continuous response model based on the fuzzified cluster centers are obtained by a least-squares estimation procedure. CANFIS is a two-stage data-modeling technique that combines the strength of CART to optimize the process of selecting predictors from a large pool of potential predictors with the modeling strength of NFIS. A CANFIS model requires negligible computer time to run. CANFIS models for ground-level O{sub 3}, particulates, and other pollutants will be produced for each of about 100 Canadian sites. The air-quality models will run twice daily using a small number of predictors isolated from a large pool of upstream and local Lagrangian potential predictors.
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model. PMID:15490436
NASA Astrophysics Data System (ADS)
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Liang, Hua; Wu, Hulin
2008-12-01
Differential equation (DE) models are widely used in many scientific fields that include engineering, physics and biomedical sciences. The so-called "forward problem", the problem of simulations and predictions of state variables for given parameter values in the DE models, has been extensively studied by mathematicians, physicists, engineers and other scientists. However, the "inverse problem", the problem of parameter estimation based on the measurements of output variables, has not been well explored using modern statistical methods, although some least squares-based approaches have been proposed and studied. In this paper, we propose parameter estimation methods for ordinary differential equation models (ODE) based on the local smoothing approach and a pseudo-least squares (PsLS) principle under a framework of measurement error in regression models. The asymptotic properties of the proposed PsLS estimator are established. We also compare the PsLS method to the corresponding SIMEX method and evaluate their finite sample performances via simulation studies. We illustrate the proposed approach using an application example from an HIV dynamic study. PMID:19956350
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. PMID:26530819
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
A novel strategy for forensic age prediction by DNA methylation and support vector regression model
Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei
2015-01-01
High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
Regression based modeling of vegetation and climate variables for the Amazon rainforests
NASA Astrophysics Data System (ADS)
Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.
2015-12-01
Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate
NASA Astrophysics Data System (ADS)
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
NASA Astrophysics Data System (ADS)
Gaeta, Alessandra; Cattani, Giorgio; Di Menno di Bucchianico, Alessandro; De Santis, Antonella; Cesaroni, Giulia; Badaloni, Chiara; Ancona, Carla; Forastiere, Francesco; Sozzi, Roberto; Bolignano, Andrea; Sacco, Fabrizio
2016-04-01
The aim of this study was to evaluate the small scale spatial variability of nitrogen dioxide (NO2) and selected VOCs (benzene, toluene, acrolein and formaldehyde) concentrations using Land Use Regression models (LURs) in a complex multi sources domain (64 km2), containing a mid-size airport: the Ciampino Airport, located in Ciampino, Rome, Italy. 46 diffusion tube samplers were deployed within a domain centred in the airport over two 2-weekly periods (June 2011-January 2012). GIS-derived predictor variables, with varying buffer size, were evaluated to model spatial variation of NO2, benzene, toluene, formaldehyde and acrolein annual average concentrations. The airport apportionment to air quality was investigated using a Lagrangian dispersion model (SPRAY). A stepwise selection procedure was used to develop the linear regression models. The models were validated using leave one out cross validation (LOOCV) method. In this study, the use of LURs was found to be effective to explain spatial variability of NO2 (adjusted-R2 = 0.72), benzene (adjusted-R2 = 0.53), toluene (adjusted-R2 = 0.50) and acrolein (adjusted-R2 = 0.51), while limited power was achieved with the formaldehyde modeling (adjusted-R2 = 0.24). For all pollutants LURs output showed that the small scale spatial variability was mainly explained by local traffic. The airport contribution to the observed spatial variability was adequately quantified only for acrolein (0.43 (±0.69) μg/m3 in an area of about 6 km2, SW located to the airport runway), while for NO2 and formaldehyde, only a little portion of the spatial variability in a limited portion of the study domain was attributable to airport related emissions.
Liu, Dawei; Lin, Xihong; Ghosh, Debashis
2007-12-01
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations. PMID:18078480
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. PMID:26720396
Ratnarajah, Nagulan; Simmons, Andy; Hojjatoleslami, Ali
2011-01-01
We present a novel approach for probabilistic clustering of white matter fibre pathways using curve-based regression mixture modelling techniques in 3D curve space. The clustering algorithm is based on a principled method for probabilistic modelling of a set of fibre trajectories as individual sequences of points generated from a finite mixture model consisting of multivariate polynomial regression model components. Unsupervised learning is carried out using maximum likelihood principles. Specifically, conditional mixture is used together with an EM algorithm to estimate cluster membership. The result of clustering is a probabilistic assignment of fibre trajectories to each cluster and an estimate of cluster parameters. A statistical shape model is calculated for each clustered fibre bundle using fitted parameters of the probabilistic clustering. We illustrate the potential of our clustering approach on synthetic and real data. PMID:21995009
A Comparison of Robust and Nonparametric Estimators under the Simple Linear Regression Model.
ERIC Educational Resources Information Center
Nevitt, Jonathan; Tam, Hak P.
This study investigates parameter estimation under the simple linear regression model for situations in which the underlying assumptions of ordinary least squares estimation are untenable. Classical nonparametric estimation methods are directly compared against some robust estimation methods for conditions in which varying degrees of outliers are…
Modeling protein tandem mass spectrometry data with an extended linear regression strategy.
Liu, Han; Bonner, Anthony J; Emili, Andrew
2004-01-01
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes. PMID:17270923
ERIC Educational Resources Information Center
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Not Quite Normal: Consequences of Violating the Assumption of Normality in Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Smith, Jessalyn; Fagan, Abigail A.; Jaki, Thomas; Feaster, Daniel J.; Masyn, Katherine; Hawkins, J. David; Howe, George
2012-01-01
Regression mixture models, which have only recently begun to be used in applied research, are a new approach for finding differential effects. This approach comes at the cost of the assumption that error terms are normally distributed within classes. This study uses Monte Carlo simulations to explore the effects of relatively minor violations of…
ERIC Educational Resources Information Center
Cason, Gerald J.; Cason, Carolyn L.
A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.
ERIC Educational Resources Information Center
Fanning, Fred; Newman, Isadore
Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…
NASA Astrophysics Data System (ADS)
Hill, D.; Bell, K. R. W.; McMillan, D.; Infield, D.
2014-05-01
The growth of wind power production in the electricity portfolio is striving to meet ambitious targets set, for example by the EU, to reduce greenhouse gas emissions by 20% by 2020. Huge investments are now being made in new offshore wind farms around UK coastal waters that will have a major impact on the GB electrical supply. Representations of the UK wind field in syntheses which capture the inherent structure and correlations between different locations including offshore sites are required. Here, Vector Auto-Regressive (VAR) models are presented and extended in a novel way to incorporate offshore time series from a pan-European meteorological model called COSMO, with onshore wind speeds from the MIDAS dataset provided by the British Atmospheric Data Centre. Forecasting ability onshore is shown to be improved with the inclusion of the offshore sites with improvements of up to 25% in RMS error at 6 h ahead. In addition, the VAR model is used to synthesise time series of wind at each offshore site, which are then used to estimate wind farm capacity factors at the sites in question. These are then compared with estimates of capacity factors derived from the work of Hawkins et al. (2011). A good degree of agreement is established indicating that this synthesis tool should be useful in power system impact studies.
Evaluation and application of regional turbidity-sediment regression models in Virginia
Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James; Chanat, Jeffrey G.
2015-01-01
Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.
Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H
2014-12-30
For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. PMID:25345575
Regressions by leaps and bounds and biased estimation techniques in yield modeling
NASA Technical Reports Server (NTRS)
Marquina, N. E. (Principal Investigator)
1979-01-01
The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.
ERIC Educational Resources Information Center
Fong, Duncan K. H.; Ebbes, Peter; DeSarbo, Wayne S.
2012-01-01
Multiple regression is frequently used across the various social sciences to analyze cross-sectional data. However, it can often times be challenging to justify the assumption of common regression coefficients across all respondents. This manuscript presents a heterogeneous Bayesian regression model that enables the estimation of…
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits. PMID:21298571
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
NASA Astrophysics Data System (ADS)
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Hopkins, Dale A.
2002-01-01
In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.
A Multilevel Regression Model for Geographical Studies in Sets of Non-Adjacent Cities
Marí-Dell’Olmo, Marc; Martínez-Beneito, Miguel Ángel
2015-01-01
In recent years, small-area-based ecological regression analyses have been published that study the association between a health outcome and a covariate in several cities. These analyses have usually been performed independently for each city and have therefore yielded unrelated estimates for the cities considered, even though the same process has been studied in all of them. In this study, we propose a joint ecological regression model for multiple cities that accounts for spatial structure both within and between cities and explore the advantages of this model. The proposed model merges both disease mapping and geostatistical ideas. Our proposal is compared with two alternatives, one that models the association for each city as fixed effects and another that treats them as independent and identically distributed random effects. The proposed model allows us to estimate the association (and assess its significance) at locations with no available data. Our proposal is illustrated by an example of the association between unemployment (as a deprivation surrogate) and lung cancer mortality among men in 31 Spanish cities. In this example, the associations found were far more accurate for the proposed model than those from the fixed effects model. Our main conclusion is that ecological regression analyses can be markedly improved by performing joint analyses at several locations that share information among them. This finding should be taken into consideration in the design of future epidemiological studies. PMID:26308613
Age estimation based on pelvic ossification using regression models from conventional radiography.
Zhang, Kui; Dong, Xiao-Ai; Fan, Fei; Deng, Zhen-Hua
2016-07-01
To establish regression models for age estimation from the combination of the ossification of iliac crest and ischial tuberosity. One thousand three hundred and seventy-nine conventional pelvic radiographs at the West China Hospital of Sichuan University between January 2010 and June 2012 were evaluated retrospectively. The receiver operating characteristic analysis was performed to measure the value of estimation of 18 years of age with the classification scheme for the iliac crest and ischial tuberosity. Regression analysis was performed, and formulas for calculating approximate chronological age according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity were developed. The areas under the receiver operating characteristic (ROC) curves were above 0.9 (p < 0.001), indicating a good prediction of the grading systems, and the cubic regression model was found to have the highest R-square value (R (2) = 0.744 for female and R (2) = 0.753 for male). The present classification scheme for apophyseal iliac crest ossification and the ischial tuberosity may be used for age estimation. And the present established cubic regression model according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity can be used for age estimation. PMID:27169673
Larsson, A
1997-08-01
The objective of this study was to investigate the conditions for regression analysis of data from equilibrium experiments. One important issue was to recognize that Kd and the binding site concentration (A) are not of equal nature, although both are parameters in the regression analysis. Whereas Kd approximates to a true constant, A is subject to experimental variation due to pipetting errors and in solid-phase experiments also to uneven coating properties. While recognizing that the ideal assumptions for ordinary regression analysis are poorly satisfied, different regression models were evaluated by extensive simulations. It was first established by a 'worst case' investigation that a limited error (8%) in the dependent variable is not critical for the results obtained at curve-fitting to Langmuir's equation. Seven different equations were compared for the calculation of data representing a solid-phase equilibrium experiment with statistical but no systematic errors. All the equations are rearrangements of the law of mass action. In this setting the Scatchrd plot gave the best result, but also the double reciprocal and the Woolf plots worked well in weighted analysis. Langmuir's equation gave the best result of the 4 nonlinear regression models tested. The influence of one type of systematic error was also investigated. This assumed that 10% of the label was positioned on particles other than the functional ligand molecules. This systematic error was amplified, which resulted in a substantial bias. The calculated Kd-values varied slightly with the regression method used and were almost 24% too high in the best methods. PMID:9328576
Probabilistic Mixture Regression Models for Alignment of LC-MS Data
Befekadu, Getachew K.; Tadesse, Mahlet G.; Tsai, Tsung-Heng; Ressom, Habtom W.
2010-01-01
A novel framework of a probabilistic mixture regression model (PMRM) is presented for alignment of liquid chromatography-mass spectrometry (LC-MS) data with respect to both retention time (RT) and mass-to-charge ratio (m/z). The expectation maximization algorithm is used to estimate the joint parameters of spline-based mixture regression models and prior transformation density models. The latter accounts for the variability in RT points, m/z values, and peak intensities. The applicability of PMRM for alignment of LC-MS data is demonstrated through three datasets. The performance of PMRM is compared with other alignment approaches including dynamic time warping, correlation optimized warping, and continuous profile model in terms of coefficient variation of replicate LC-MS runs and accuracy in detecting differentially abundant peptides/proteins. PMID:20837998
NASA Astrophysics Data System (ADS)
Urrutia, J. D.; Bautista, L. A.; Baccay, E. B.
2014-04-01
The aim of this study was to develop mathematical models for estimating earthquake casualties such as death, number of injured persons, affected families and total cost of damage. To quantify the direct damages from earthquakes to human beings and properties given the magnitude, intensity, depth of focus, location of epicentre and time duration, the regression models were made. The researchers formulated models through regression analysis using matrices and used α = 0.01. The study considered thirty destructive earthquakes that hit the Philippines from the inclusive years 1968 to 2012. Relevant data about these said earthquakes were obtained from Philippine Institute of Volcanology and Seismology. Data on damages and casualties were gathered from the records of National Disaster Risk Reduction and Management Council. The mathematical models made are as follows: This study will be of great value in emergency planning, initiating and updating programs for earthquake hazard reductionin the Philippines, which is an earthquake-prone country.
Application of Dynamic Grey-Linear Auto-regressive Model in Time Scale Calculation
NASA Astrophysics Data System (ADS)
Yuan, H. T.; Don, S. W.
2009-01-01
Because of the influence of different noise and the other factors, the running of an atomic clock is very complex. In order to forecast the velocity of an atomic clock accurately, it is necessary to study and design a model to calculate its velocity in the near future. By using the velocity, the clock could be used in the calculation of local atomic time and the steering of local universal time. In this paper, a new forecast model called dynamic grey-liner auto-regressive model is studied, and the precision of the new model is given. By the real data of National Time Service Center, the new model is tested.
Path model analyzed with ordinary least squares multiple regression versus LISREL.
Kline, T J; Klammer, J D
2001-03-01
The data of a specified path model using the variables of voice, perceived organizational support, being heard, and procedural justice were subjected to the two separate structural equation modeling analytic techniques--that of ordinary least squares regression and LISREL. A comparison of the results and differences between the analyses is discussed, with the LISREL approach being stronger from both theoretical and statistical perspectives. PMID:11403343
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that
Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression
NASA Astrophysics Data System (ADS)
Dineva, S.; Prodanova, K.; Mlachkova, D.
2013-12-01
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
Knafl, George J; Fennie, Kristopher P; Bova, Carol; Dieckhaus, Kevin; Williams, Ann B
2004-03-15
An adaptive approach to Poisson regression modelling is presented for analysing event data from electronic devices monitoring medication-taking. The emphasis is on applying this approach to data for individual subjects although it also applies to data for multiple subjects. This approach provides for visualization of adherence patterns as well as for objective comparison of actual device use with prescribed medication-taking. Example analyses are presented using data on openings of electronic pill bottle caps monitoring adherence of subjects with HIV undergoing highly active antiretroviral therapies. The modelling approach consists of partitioning the observation period, computing grouped event counts/rates for intervals in this partition, and modelling these event counts/rates in terms of elapsed time after entry into the study using Poisson regression. These models are based on adaptively selected sets of power transforms of elapsed time determined by rule-based heuristic search through arbitrary sets of parametric models, thereby effectively generating a smooth non-parametric regression fit to the data. Models are compared using k-fold likelihood cross-validation. PMID:14981675
NASA Astrophysics Data System (ADS)
Mel'nikov, A. V.
1996-10-01
Contents Introduction Chapter I. Basic notions and results from contemporary martingale theory §1.1. General notions of the martingale theory §1.2. Convergence (a.s.) of semimartingales. The strong law of large numbers and the law of the iterated logarithm Chapter II. Stochastic differential equations driven by semimartingales §2.1. Basic notions and results of the theory of stochastic differential equations driven by semimartingales §2.2. The method of monotone approximations. Existence of strong solutions of stochastic equations with non-smooth coefficients §2.3. Linear stochastic equations. Properties of stochastic exponentials §2.4. Linear stochastic equations. Applications to models of the financial market Chapter III. Procedures of stochastic approximation as solutions of stochastic differential equations driven by semimartingales §3.1. Formulation of the problem. A general model and its relation to the classical one §3.2. A general description of the approach to the procedures of stochastic approximation. Convergence (a.s.) and asymptotic normality §3.3. The Gaussian model of stochastic approximation. Averaged procedures and their effectiveness Chapter IV. Statistical estimation in regression models with martingale noises §4.1. The formulation of the problem and classical regression models §4.2. Asymptotic properties of MLS-estimators. Strong consistency, asymptotic normality, the law of the iterated logarithm §4.3. Regression models with deterministic regressors §4.4. Sequential MLS-estimators with guaranteed accuracy and sequential statistical inferences Bibliography
NONLINEAR-REGRESSION GROUNDWATER FLOW MODELING OF A DEEP REGIONAL AQUIFER SYSTEM.
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-01-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi**2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow.
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
NASA Astrophysics Data System (ADS)
Das, Iswar; Stein, Alfred; Kerle, Norman; Dadhwal, Vinay K.
2012-12-01
Landslide susceptibility mapping (LSM) along road corridors in the Indian Himalayas is an essential exercise that helps planners and decision makers in determining the severity of probable slope failure areas. Logistic regression is commonly applied for this purpose, as it is a robust and straightforward technique that is relatively easy to handle. Ordinary logistic regression as a data-driven technique, however, does not allow inclusion of prior information. This study presents Bayesian logistic regression (BLR) for landslide susceptibility assessment along road corridors. The methodology is tested in a landslide-prone area in the Bhagirathi river valley in the Indian Himalayas. Parameter estimates from BLR are compared with those obtained from ordinary logistic regression. By means of iterative Markov Chain Monte Carlo simulation, BLR provides a rich set of results on parameter estimation. We assessed model performance by the receiver operator characteristics curve analysis, and validated the model using 50% of the landslide cells kept apart for testing and validation. The study concludes that BLR performs better in posterior parameter estimation in general and the uncertainty estimation in particular.
Model-wise and point-wise random sample consensus for robust regression and outlier detection.
El-Melegy, Moumen T
2014-11-01
Popular regression techniques often suffer at the presence of data outliers. Most previous efforts to solve this problem have focused on using an estimation algorithm that minimizes a robust M-estimator based error criterion instead of the usual non-robust mean squared error. However the robustness gained from M-estimators is still low. This paper addresses robust regression and outlier detection in a random sample consensus (RANSAC) framework. It studies the classical RANSAC framework and highlights its model-wise nature for processing the data. Furthermore, it introduces for the first time a point-wise strategy of RANSAC. New estimation algorithms are developed following both the model-wise and point-wise RANSAC concepts. The proposed algorithms' theoretical robustness and breakdown points are investigated in a novel probabilistic setting. While the proposed concepts and algorithms are generic and general enough to adopt many regression machineries, the paper focuses on multilayered feed-forward neural networks in solving regression problems. The algorithms are evaluated on synthetic and real data, contaminated with high degrees of outliers, and compared to existing neural network training algorithms. Furthermore, to improve the time performance, parallel implementations of the two algorithms are developed and assessed to utilize the multiple CPU cores available on nowadays computers. PMID:25047916
2013-01-01
Background Integrase inhibitors (INI) form a new drug class in the treatment of HIV-1 patients. We developed a linear regression modeling approach to make a quantitative raltegravir (RAL) resistance phenotype prediction, as Fold Change in IC50 against a wild type virus, from mutations in the integrase genotype. Methods We developed a clonal genotype-phenotype database with 991 clones from 153 clinical isolates of INI naïve and RAL treated patients, and 28 site-directed mutants. We did the development of the RAL linear regression model in two stages, employing a genetic algorithm (GA) to select integrase mutations by consensus. First, we ran multiple GAs to generate first order linear regression models (GA models) that were stochastically optimized to reach a goal R2 accuracy, and consisted of a fixed-length subset of integrase mutations to estimate INI resistance. Secondly, we derived a consensus linear regression model in a forward stepwise regression procedure, considering integrase mutations or mutation pairs by descending prevalence in the GA models. Results The most frequently occurring mutations in the GA models were 92Q, 97A, 143R and 155H (all 100%), 143G (90%), 148H/R (89%), 148K (88%), 151I (81%), 121Y (75%), 143C (72%), and 74M (69%). The RAL second order model contained 30 single mutations and five mutation pairs (p < 0.01): 143C/R&97A, 155H&97A/151I and 74M&151I. The R2 performance of this model on the clonal training data was 0.97, and 0.78 on an unseen population genotype-phenotype dataset of 171 clinical isolates from RAL treated and INI naïve patients. Conclusions We describe a systematic approach to derive a model for predicting INI resistance from a limited amount of clonal samples. Our RAL second order model is made available as an Additional file for calculating a resistance phenotype as the sum of integrase mutations and mutation pairs. PMID:23282253
Adaptive multitrack reconstruction for particle trajectories based on fuzzy c-regression models
NASA Astrophysics Data System (ADS)
Niu, Li-Bo; Li, Yu-Lan; Huang, Meng; Fu, Jian-Qiang; He, Bin; Li, Yuan-Jing
2015-03-01
In this paper, an approach to straight and circle track reconstruction is presented, which is suitable for particle trajectories in an homogenous magnetic field (or 0 T) or Cherenkov rings. The method is based on fuzzy c-regression models, where the number of the models stands for the track number. The approximate number of tracks and a rough evaluation of the track parameters given by Hough transform are used to initiate the fuzzy c-regression models. The technique effectively represents a merger between track candidates finding and parameters fitting. The performance of this approach is tested by some simulated data under various scenarios. Results show that this technique is robust and could provide very accurate results efficiently. Supported by National Natural Science Foundation of China (11275109)
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers
Zhu, Hongtu; Khondker, Zakaria; Lu, Zhaohua; Ibrahim, Joseph G.
2014-01-01
We propose a Bayesian generalized low rank regression model (GLRR) for the analysis of both high-dimensional responses and covariates. This development is motivated by performing searches for associations between genetic variants and brain imaging phenotypes. GLRR integrates a low rank matrix to approximate the high-dimensional regression coefficient matrix of GLRR and a dynamic factor model to model the high-dimensional covariance matrix of brain imaging phenotypes. Local hypothesis testing is developed to identify significant covariates on high-dimensional responses. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of GLRR and its comparison with several competing approaches. We apply GLRR to investigate the impact of 1,071 SNPs on top 40 genes reported by AlzGene database on the volumes of 93 regions of interest (ROI) obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI). PMID:25349462
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Oliveira, María; Einbeck, Jochen; Higueras, Manuel; Ainsbury, Elizabeth; Puig, Pedro; Rothkamm, Kai
2016-03-01
Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed. PMID:26461836
Bentamapimod (JNK Inhibitor AS602801) Induces Regression of Endometriotic Lesions in Animal Models.
Palmer, Stephen S; Altan, Melis; Denis, Deborah; Tos, Enrico Gillio; Gotteland, Jean-Pierre; Osteen, Kevin G; Bruner-Tran, Kaylon L; Nataraja, Selvaraj G
2016-01-01
Endometriosis is an estrogen (ER)-dependent gynecological disease caused by the growth of endometrial tissue at extrauterine sites. Current endocrine therapies address the estrogenic aspect of disease and offer some relief from pain but are associated with significant side effects. Immune dysfunction is also widely believed to be an underlying contributor to the pathogenesis of this disease. This study evaluated an inhibitor of c-Jun N-terminal kinase, bentamapimod (AS602801), which interrupts immune pathways, in 2 rodent endometriosis models. Treatment of nude mice bearing xenografts biopsied from women with endometriosis (BWE) with 30 mg/kg AS602801 caused 29% regression of lesion. Medroxyprogesterone acetate (MPA) or progesterone (PR) alone did not cause regression of BWE lesions, but combining 10 mg/kg AS602801 with MPA caused 38% lesion regression. In human endometrial organ cultures (from healthy women), treatment with AS602801 or MPA reduced matrix metalloproteinase-3 (MMP-3) release into culture medium. In organ cultures established with BWE, PR or MPA failed to inhibit MMP-3 secretion, whereas AS602801 alone or MPA + AS602801 suppressed MMP-3 production. In an autologous rat endometriosis model, AS602801 caused 48% regression of lesions compared to GnRH antagonist Antide (84%). AS602801 reduced inflammatory cytokines in endometriotic lesions, while levels of cytokines in ipsilateral horns were unaffected. Furthermore, AS602801 enhanced natural killer cell activity, without apparent negative effects on uterus. These results indicate that bentamapimod induced regression of endometriotic lesions in endometriosis rodent animal models without suppressing ER action. c-Jun N-terminal kinase inhibition mediated a comprehensive reduction in cytokine secretion and moreover was able to overcome PR resistance. PMID:26335175
Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran
2015-01-01
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression. PMID:26645087
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values. PMID:26825636
Predictive Regression Models of Monthly Seismic Energy Emissions Induced by Longwall Mining
NASA Astrophysics Data System (ADS)
Jakubowski, Jacek; Tajduś, Antoni
2014-10-01
This article presents the development and validation of predictive regression models of longwall mining-induced seismicity, based on observations in 63 longwalls, in 12 seams, in the Bielszowice colliery in the Upper Silesian Coal Basin, which took place between 1992 and 2012. A predicted variable is the logarithm of the monthly sum of seismic energy induced in a longwall area. The set of predictors include seven quantitative and qualitative variables describing some mining and geological conditions and earlier seismicity in longwalls. Two machine learning methods have been used to develop the models: boosted regression trees and neural networks. Two types of model validation have been applied: on a random validation sample and on a time-based validation sample. The set of a few selected variables enabled nonlinear regression models to be built which gave relatively small prediction errors, taking the complex and strongly stochastic nature of the phenomenon into account. The article presents both the models of periodic forecasting for the following month as well as long-term forecasting.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods. PMID:23247520
Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach
Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.
2005-01-01
A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.
Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks.
Richter, Philipp; Toledano-Ayala, Manuel
2015-01-01
Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996
Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks
Richter, Philipp; Toledano-Ayala, Manuel
2015-01-01
Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of
Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.
Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang
2016-07-01
Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience. PMID:27243599
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min. PMID:25266685
Lee, Myung Hee; Liu, Yufeng
2013-12-01
The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224
Determination of airplane model structure from flight data using splines and stepwise regression
NASA Technical Reports Server (NTRS)
Klein, V.; Batterson, J. G.
1983-01-01
A procedure for the determination of airplane model structure from flight data is presented. The model is based on a polynomial spline representation of the aerodynamic coefficients, and the procedure is implemented by use of a stepwise regression. First, a form of the aerodynamic force and moment coefficients amenable to the utilization of splines is developed. Next, expressions for the splines in one and two variables are introduced. Then the steps in the determination of an aerodynamic model structure and the estimation of parameters are discussed briefly. The focus is on the application to flight data of the techniques developed.
Ratios as a size adjustment in morphometrics.
Albrecht, G H; Gelvin, B R; Hartman, S E
1993-08-01
Simple ratios in which a measurement variable is divided by a size variable are commonly used but known to be inadequate for eliminating size correlations from morphometric data. Deficiencies in the simple ratio can be alleviated by incorporating regression coefficients describing the bivariate relationship between the measurement and size variables. Recommendations have included: 1) subtracting the regression intercept to force the bivariate relationship through the origin (intercept-adjusted ratios); 2) exponentiating either the measurement or the size variable using an allometry coefficient to achieve linearity (allometrically adjusted ratios); or 3) both subtracting the intercept and exponentiating (fully adjusted ratios). These three strategies for deriving size-adjusted ratios imply different data models for describing the bivariate relationship between the measurement and size variables (i.e., the linear, simple allometric, and full allometric models, respectively). Algebraic rearrangement of the equation associated with each data model leads to a correctly formulated adjusted ratio whose expected value is constant (i.e., size correlation is eliminated). Alternatively, simple algebra can be used to derive an expected value function for assessing whether any proposed ratio formula is effective in eliminating size correlations. Some published ratio adjustments were incorrectly formulated as indicated by expected values that remain a function of size after ratio transformation. Regression coefficients incorporated into adjusted ratios must be estimated using least-squares regression of the measurement variable on the size variable. Use of parameters estimated by any other regression technique (e.g., major axis or reduced major axis) results in residual correlations between size and the adjusted measurement variable. Correctly formulated adjusted ratios, whose parameters are estimated by least-squares methods, do control for size correlations. The size-adjusted
Development and comparison of regression models for the uptake of metals into various field crops.
Novotná, Markéta; Mikeš, Ondřej; Komprdová, Klára
2015-12-01
Field crops represent one of the highest contributions to dietary metal exposure. The aim of this study was to develop specific regression models for the uptake of metals into various field crops and to compare the usability of other available models. We analysed samples of potato, hop, maize, barley, wheat, rape seed, and grass from 66 agricultural sites. The influence of measured soil concentrations and soil factors (pH, organic carbon, content of silt and clay) on the plant concentrations of Cd, Cr, Cu, Mo, Ni, Pb and Zn was evaluated. Bioconcentration factors (BCF) and plant-specific metal models (PSMM) developed from multivariate regressions were calculated. The explained variability of the models was from 19 to 64% and correlations between measured and predicted concentrations were between 0.43 and 0.90. The developed hop and rapeseed models are new in this field. Available models from literature showed inaccurate results, except for Cd; the modelling efficiency was mostly around zero. The use of interaction terms between parameters can significantly improve plant-specific models. PMID:26448504
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234
Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E
2015-05-01
The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed. PMID:25414030
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward
2016-08-01
Monitoring street-level particulates is essential to air quality management but challenging in high-density Hong Kong due to limitations in local monitoring network and the complexities of street environment. By employing vehicle-based mobile measurements, land use regression (LUR) models were developed to estimate the spatial variation of PM2.5 and PM10 in the downtown area of Hong Kong. Sampling runs were conducted along routes measuring a total of 30 km during a selected measurement period of total 14 days. In total, 321 independent variables were examined to develop LUR models by using stepwise regression with PM2.5 and PM10 as dependent variables. Approximately, 10% increases in the model adjusted R(2) were achieved by integrating urban/building morphology as independent variables into the LUR models. Resultant LUR models show that the most decisive factors on street-level air quality in Hong Kong are frontal area index, an urban/building morphological parameter, and road network line density and traffic volume, two parameters of road traffic. The adjusted R(2) of the final LUR models of PM2.5 and PM10 are 0.633 and 0.707, respectively. These results indicate that urban morphology is more decisive to the street-level air quality in high-density cities than other cities. Air pollution hotspots were also identified based on the LUR mapping. PMID:27381187
A marginalized zero-inflated Poisson regression model with overall exposure effects.
Long, D Leann; Preisser, John S; Herring, Amy H; Golin, Carol E
2014-12-20
The zero-inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non-susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing-based safer sex counseling intervention, designed to reduce unprotected sexual act counts. PMID:25220537
NASA Astrophysics Data System (ADS)
Montanari, A.
2006-12-01
This contribution introduces a statistically based approach for uncertainty assessment in hydrological modeling, in an optimality context. Indeed, in several real world applications, there is the need for the user to select a model that is deemed to be the best possible choice accordingly to a given goodness of fit criteria. In this case, it is extremely important to assess the model uncertainty, intended as the range around the model output within which the measured hydrological variable is expected to fall with a given probability. This indication allows the user to quantify the risk associated to a decision that is based on the model response. The technique proposed here is carried out by inferring the probability distribution of the hydrological model error through a non linear multiple regression approach, depending on an arbitrary number of selected conditioning variables. These may include the current and previous model output as well as internal state variables of the model. The purpose is to indirectly relate the model error to the sources of uncertainty, through the conditioning variables. The method can be applied to any model of arbitrary complexity, included distributed approaches. The probability distribution of the model error is derived in the Gaussian space, through a meta-Gaussian approach. The normal quantile transform is applied in order to make the marginal probability distribution of the model error and the conditioning variables Gaussian. Then the above marginal probability distributions are related through the multivariate Gaussian distribution, whose parameters are estimated via multiple regression. Application of the inverse of the normal quantile transform allows the user to derive the confidence limits of the model output for an assigned significance level. The proposed technique is valid under statistical assumptions, that are essentially those conditioning the validity of the multiple regression in the Gaussian space. Statistical tests
Notes on power of normality tests of error terms in regression models
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
NASA Astrophysics Data System (ADS)
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
Liu, Fengchen; Porco, Travis C.; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K.; Bailey, Robin L.; Keenan, Jeremy D.; Solomon, Anthony W.; Emerson, Paul M.; Gambhir, Manoj; Lietman, Thomas M.
2015-01-01
Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. PMID:26302380
An hourly regression model for ultrafine particles in a near-highway urban area
Patton, Allison P.; Collins, Caitlin; Naumova, Elena N.; Zamore, Wig; Brugge, Doug; Durant, John L.
2015-01-01
Estimating ultrafine particle number concentrations (PNC) near highways for exposure assessment in chronic health studies requires models capable of capturing PNC spatial and temporal variations over the course of a full year. The objectives of this work were to describe the relationship between near-highway PNC and potential predictors, and to build and validate hourly log-linear regression models. PNC was measured near Interstate 93 (I-93) in Somerville, MA (USA) using a mobile monitoring platform driven for 234 hours on 43 days between August 2009 and September 2010. Compared to urban background, PNC levels were consistently elevated within 100–200 m of I-93, with gradients impacted by meteorological and traffic conditions. Temporal and spatial variables including wind speed and direction, temperature, highway traffic, and distance to I-93 and major roads contributed significantly to the full regression model. Cross-validated model R2 values ranged from 0.38–0.47, with higher values achieved (0.43–0.53) when short-duration PNC spikes were removed. The model predicts highest PNC near major roads and on cold days with low wind speeds. The model allows estimation of hourly ambient PNC at 20-m resolution in a near-highway neighborhood. PMID:24559198
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. PMID:25490981
Design Sensitivity for a Subsonic Aircraft Predicted by Neural Network and Regression Models
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Patnaik, Surya N.
2005-01-01
A preliminary methodology was obtained for the design optimization of a subsonic aircraft by coupling NASA Langley Research Center s Flight Optimization System (FLOPS) with NASA Glenn Research Center s design optimization testbed (COMETBOARDS with regression and neural network analysis approximators). The aircraft modeled can carry 200 passengers at a cruise speed of Mach 0.85 over a range of 2500 n mi and can operate on standard 6000-ft takeoff and landing runways. The design simulation was extended to evaluate the optimal airframe and engine parameters for the subsonic aircraft to operate on nonstandard runways. Regression and neural network approximators were used to examine aircraft operation on runways ranging in length from 4500 to 7500 ft.
Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources
O’Brien, Liam M.; Fitzmaurice, Garrett M.
2006-01-01
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry. PMID:15726666
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087
Comparing Spatial and Multilevel Regression Models for Binary Outcomes in Neighborhood Studies
Xu, Hongwei
2013-01-01
The standard multilevel regressions that are widely used in neighborhood research typically ignore potential between-neighborhood correlations due to underlying spatial processes, and hence produce inappropriate inferences about neighborhood effects. In contrast, spatial models make estimations and predictions across areas by explicitly modeling the spatial correlations among observations in different locations. A better understanding of the strengths and limitations of spatial models as compared to the standard multilevel model is needed to improve the research on neighborhood and spatial effects. This research systematically compares model estimations and predictions for binary outcomes between (distance- and lattice-based) spatial and the standard multilevel models in the presence of both within- and between-neighborhood correlations, through simulations. Results from simulation analysis reveal that the standard multilevel and spatial models produce similar estimates of fixed effects, but different estimates of random effects variances. Both the standard multilevel and pure spatial models tend to overestimate the corresponding random effects variances, compared to hybrid models when both non-spatial within neighborhood and spatial between-neighborhood effects exist. Spatial models also outperform the standard multilevel model by a narrow margin in case of fully out-of-sample predictions. Distance-based spatial models provide extra spatial information and have stronger predictive power than lattice-based models under certain circumstances. These merits of spatial modeling are exhibited in an empirical analysis of the child mortality data from 1880 Newark, New Jersey. PMID:25284905
Covariate-Adjusted Linear Mixed Effects Model with an Application to Longitudinal Data
Nguyen, Danh V.; Şentürk, Damla; Carroll, Raymond J.
2009-01-01
Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology. PMID:19266053
Sesana, R C; Bignardi, A B; Borquis, R R A; El Faro, L; Baldi, F; Albuquerque, L G; Tonhati, H
2010-10-01
The objective of this work was to estimate covariance functions for additive genetic and permanent environmental effects and, subsequently, to obtain genetic parameters for buffalo's test-day milk production using random regression models on Legendre polynomials (LPs). A total of 17 935 test-day milk yield (TDMY) from 1433 first lactations of Murrah buffaloes, calving from 1985 to 2005 and belonging to 12 herds located in São Paulo state, Brazil, were analysed. Contemporary groups (CGs) were defined by herd, year and month of milk test. Residual variances were modelled through variance functions, from second to fourth order and also by a step function with 1, 4, 6, 22 and 42 classes. The model of analyses included the fixed effect of CGs, number of milking, age of cow at calving as a covariable (linear and quadratic) and the mean trend of the population. As random effects were included the additive genetic and permanent environmental effects. The additive genetic and permanent environmental random effects were modelled by LP of days in milk from quadratic to seventh degree polynomial functions. The model with additive genetic and animal permanent environmental effects adjusted by quintic and sixth order LP, respectively, and residual variance modelled through a step function with six classes was the most adequate model to describe the covariance structure of the data. Heritability estimates decreased from 0.44 (first week) to 0.18 (fourth week). Unexpected negative genetic correlation estimates were obtained between TDMY records at first weeks with records from middle to the end of lactation, being the values varied from -0.07 (second with eighth week) to -0.34 (1st with 42nd week). TDMY heritability estimates were moderate in the course of the lactation, suggesting that this trait could be applied as selection criteria in milking buffaloes. PMID:20831561
Harlim, John; Mahdi, Adam; Majda, Andrew J.
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data. PMID:17610294
Selecting Spatial Scale of Covariates in Regression Models of Environmental Exposures
Grant, Lauren P.; Gennings, Chris; Wheeler, David C.
2015-01-01
Environmental factors or socioeconomic status variables used in regression models to explain environmental chemical exposures or health outcomes are often in practice modeled at the same buffer distance or spatial scale. In this paper, we present four model selection algorithms that select the best spatial scale for each buffer-based or area-level covariate. Contamination of drinking water by nitrate is a growing problem in agricultural areas of the United States, as ingested nitrate can lead to the endogenous formation of N-nitroso compounds, which are potent carcinogens. We applied our methods to model nitrate levels in private wells in Iowa. We found that environmental variables were selected at different spatial scales and that a model allowing spatial scale to vary across covariates provided the best goodness of fit. Our methods can be applied to investigate the association between environmental risk factors available at multiple spatial scales or buffer distances and measures of disease, including cancers. PMID:25983543
Selecting spatial scale of covariates in regression models of environmental exposures.
Grant, Lauren P; Gennings, Chris; Wheeler, David C
2015-01-01
Environmental factors or socioeconomic status variables used in regression models to explain environmental chemical exposures or health outcomes are often in practice modeled at the same buffer distance or spatial scale. In this paper, we present four model selection algorithms that select the best spatial scale for each buffer-based or area-level covariate. Contamination of drinking water by nitrate is a growing problem in agricultural areas of the United States, as ingested nitrate can lead to the endogenous formation of N-nitroso compounds, which are potent carcinogens. We applied our methods to model nitrate levels in private wells in Iowa. We found that environmental variables were selected at different spatial scales and that a model allowing spatial scale to vary across covariates provided the best goodness of fit. Our methods can be applied to investigate the association between environmental risk factors available at multiple spatial scales or buffer distances and measures of disease, including cancers. PMID:25983543
Forecasting peak asthma admissions in London: an application of quantile regression models
NASA Astrophysics Data System (ADS)
Soyiri, Ireneous N.; Reidpath, Daniel D.; Sarran, Christophe
2013-07-01
Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.
Grid Binary LOgistic REgression (GLORE): building shared models without sharing data
Jiang, Xiaoqian; Kim, Jihoon; Ohno-Machado, Lucila
2012-01-01
Objective The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. Material and methods Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. Results We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10−15 precision. In addition, GLORE is computationally efficient. Limitation In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. Conclusion The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites. PMID:22511014
Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees
NASA Astrophysics Data System (ADS)
Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.
2015-12-01
In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research. PMID:23156922
Inhibition and regression of tumors in hamster DMBA model following laser microvascular targeting
NASA Astrophysics Data System (ADS)
McMillan, Kathleen; Wang, Zhi; Shapshay, Stanley M.
1998-07-01
Vascular targeting is a recent approach to cancer therapy that aims at damaging tumor vasculature to induce tumor cell hypoxia and subsequent cell death. Squamous cell cancer arises in the superficial mucosal and cutaneous epithelial layers, and tumor microvasculature therefore may be particularly well suited for targeting by selective photothermolysis. An initial evaluation of the effect of selective eradication of microvasculature on tumor development was undertaken here using the chemically-induced hamster cheek pouch model and a 585 nm pulsed dye laser. In a first group of 6 hamsters, progression of premalignant mucosal lesions was compared between control and laser treatment groups, and laser-induced regression of established tumors was evaluated. In a second group of 12 hamsters, the number of laser treatments required to produce complete regression of tumors of the buccal mucosa was determined. The effect of the laser on tumors appearing on the skin in these animals was also investigated. These experiments showed that laser treatment inhibited tumor development and caused complete regression of established tumors 10 mm3 or smaller. Photothermal microvascular targeting may be useful in treating dyplasia and early tumors of the upper aerodigestive tract and skin, with fewer adverse sequelae than existing modalities.
You, Jinhong; Zhou, Haibo
2009-01-01
We consider statistical inference on a regression model in which some covariables are measured with errors together with an auxiliary variable. The proposed estimation for the regression coefficients is based on some estimating equations. This new method alleates some drawbacks of previously proposed estimations. This includes the requirment of undersmoothing the regressor functions over the auxiliary variable, the restriction on other covariables which can be observed exactly, among others. The large sample properties of the proposed estimator are established. We further propose a jackknife estimation, which consists of deleting one estimating equation (instead of one obervation) at a time. We show that the jackknife estimator of the regression coefficients and the estimating equations based estimator are asymptotically equivalent. Simulations show that the jackknife estimator has smaller biases when sample size is small or moderate. In addition, the jackknife estimation can also provide a consistent estimator of the asymptotic covariance matrix, which is robust to the heteroscedasticity. We illustrate these methods by applying them to a real data set from marketing science. PMID:22199460
Creating a non-linear total sediment load formula using polynomial best subset regression model
NASA Astrophysics Data System (ADS)
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
A New Climate Adjustment Tool: An update to EPA’s Storm Water Management Model
The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations.
Kondric, Miran; Trajkovski, Biljana; Strbad, Maja; Foretić, Nikola; Zenić, Natasa
2013-12-01
There is evident lack of studies which investigated morphological influence on physical fitness (PF) among preschool children. The aim of this study was to (1) calculate and interpret linear and nonlinear relationships between simple anthropometric predictors and PF criteria among preschoolers of both genders, and (2) to find critical values of the anthropometric predictors which should be recognized as the breakpoint of the negative influence on the PF. The sample of subjects consisted of 413 preschoolers aged 4 to 6 (mean age, 5.08 years; 176 girls and 237 boys), from Rijeka, Croatia. The anthropometric variables included body height (BH), body weight (BW), sum of triceps and subscapular skinfold (SUMSF), and calculated BMI (BMI = BW (kg)/BH (m)2). The PF was screened throughout testing of flexibility, repetitive strength, explosive strength, and agility. Linear and nonlinear (general quadratic model y = a + bx + cx2) regressions were calculated and interpreted simultaneously. BH and BW are far better predictors of the physical fitness status than BMI and SUMSF. In all calculated regressions excluding flexibility criterion, linear and nonlinear prediction of the PF throughout BH and BW reached statistical significance, indicating influence of the advancement in maturity status on PF variables Differences between linear and nonlinear regressions are smaller in males than in females. There are some indices that the age of 4 to 6 years is a critical period in the prevention of obesity, mostly because the extensively studied and proven negative influence of overweight and adiposity on PF tests is not yet evident. In some cases we have found evident regression breakpoints (approximately 25 kg in boys), which should be interpreted as critical values of the anthropometric measures for the studied sample of subjects. PMID:24611341
NASA Astrophysics Data System (ADS)
Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.
2015-09-01
One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.
Comparing tests appear in model-check for normal regression with spatially correlated observations
NASA Astrophysics Data System (ADS)
Somayasa, Wayan; Wibawa, Gusti A.
2016-06-01
The problem of investigating the appropriateness of an assumed model in regression analysis was traditionally handled by means of F test under independent observations. In this work we propose a more modern method based on the so-called set-indexed partial sums processes of the least squares residuals of the observations. We consider throughout this work univariate and multivariate regression models with spatially correlated observations, which are frequently encountered in the statistical modelling in geosciences as well as in mining. The decision is drawn by performing asymptotic test of statistical hypothesis based on the Kolmogorov-Smirnov and Cramér-von Misses functionals of the processes. We compare the two tests by investigating the power functions of the test. The finite sample size behavior of the tests are studied by simulating the empirical probability of rejections of H 0. It is shown that for univariate model the KS test seems to be more powerful. Conversely the Cramér-von Mises test tends to be more powerful than the KS test in the multivariate case.
NASA Astrophysics Data System (ADS)
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
NASA Astrophysics Data System (ADS)
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
NASA Astrophysics Data System (ADS)
Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali
2013-04-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.
Wang, Bing; Shen, Hao; Fang, Aiqin; Huang, De-Shuang; Jiang, Changjun; Zhang, Jun; Chen, Peng
2016-06-17
Comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC/TOF-MS) system has become a key analytical technology in high-throughput analysis. Retention index has been approved to be helpful for compound identification in one-dimensional gas chromatography, which is also true for two-dimensional gas chromatography. In this work, a novel regression model was proposed for calculating the second dimension retention index of target components where n-alkanes were used as reference compounds. This model was developed to depict the relationship among adjusted second dimension retention time, temperature of the second dimension column and carbon number of n-alkanes by an exponential nonlinear function with only five parameters. Three different criteria were introduced to find the optimal values of parameters. The performance of this model was evaluated using experimental data of n-alkanes (C7-C31) at 24 temperatures which can cover all 0-6s adjusted retention time area. The experimental results show that the mean relative error between predicted adjusted retention time and experimental data of n-alkanes was only 2%. Furthermore, our proposed model demonstrates a good extrapolation capability for predicting adjusted retention time of target compounds which located out of the range of the reference compounds in the second dimension adjusted retention time space. Our work shows the deviation was less than 9 retention index units (iu) while the number of alkanes were added up to 5. The performance of our proposed model has also been demonstrated by analyzing a mixture of compounds in temperature programmed experiments. PMID:27208985
Stone, Wesley W.; Gilliom, Robert J.
2012-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
Estimating riparian understory vegetation cover with beta regression and copula models
Eskelson, Bianca N.I.; Madsen, Lisa; Hagar, Joan C.; Temesgen, Hailemariam
2011-01-01
Understory vegetation communities are critical components of forest ecosystems. As a result, the importance of modeling understory vegetation characteristics in forested landscapes has become more apparent. Abundance measures such as shrub cover are bounded between 0 and 1, exhibit heteroscedastic error variance, and are often subject to spatial dependence. These distributional features tend to be ignored when shrub cover data are analyzed. The beta distribution has been used successfully to describe the frequency distribution of vegetation cover. Beta regression models ignoring spatial dependence (BR) and accounting for spatial dependence (BRdep) were used to estimate percent shrub cover as a function of topographic conditions and overstory vegetation structure in riparian zones in western Oregon. The BR models showed poor explanatory power (pseudo-R2 ≤ 0.34) but outperformed ordinary least-squares (OLS) and generalized least-squares (GLS) regression models with logit-transformed response in terms of mean square prediction error and absolute bias. We introduce a copula (COP) model that is based on the beta distribution and accounts for spatial dependence. A simulation study was designed to illustrate the effects of incorrectly assuming normality, equal variance, and spatial independence. It showed that BR, BRdep, and COP models provide unbiased parameter estimates, whereas OLS and GLS models result in slightly biased estimates for two of the three parameters. On the basis of the simulation study, 93–97% of the GLS, BRdep, and COP confidence intervals covered the true parameters, whereas OLS and BR only resulted in 84–88% coverage, which demonstrated the superiority of GLS, BRdep, and COP over OLS and BR models in providing standard errors for the parameter estimates in the presence of spatial dependence.
Model-Based Evaluation of Spontaneous Tumor Regression in Pilocytic Astrocytoma.
Buder, Thomas; Deutsch, Andreas; Klink, Barbara; Voss-Böhme, Anja
2015-12-01
Pilocytic astrocytoma (PA) is the most common brain tumor in children. This tumor is usually benign and has a good prognosis. Total resection is the treatment of choice and will cure the majority of patients. However, often only partial resection is possible due to the location of the tumor. In that case, spontaneous regression, regrowth, or progression to a more aggressive form have been observed. The dependency between the residual tumor size and spontaneous regression is not understood yet. Therefore, the prognosis is largely unpredictable and there is controversy regarding the management of patients for whom complete resection cannot be achieved. Strategies span from pure observation (wait and see) to combinations of surgery, adjuvant chemotherapy, and radiotherapy. Here, we introduce a mathematical model to investigate the growth and progression behavior of PA. In particular, we propose a Markov chain model incorporating cell proliferation and death as well as mutations. Our model analysis shows that the tumor behavior after partial resection is essentially determined by a risk coefficient γ, which can be deduced from epidemiological data about PA. Our results quantitatively predict the regression probability of a partially resected benign PA given the residual tumor size and lead to the hypothesis that this dependency is linear, implying that removing any amount of tumor mass will improve prognosis. This finding stands in contrast to diffuse malignant glioma where an extent of resection threshold has been experimentally shown, below which no benefit for survival is expected. These results have important implications for future therapeutic studies in PA that should include residual tumor volume as a prognostic factor. PMID:26658166
Turkson, Anthony Joe; Otchey, James Eric
2015-01-01
Introduction: Various psychosocial studies on health related lifestyles lay emphasis on the fact that the perception one has of himself as being at risk of HIV/AIDS infection was a necessary condition for preventive behaviors to be adopted. Hierarchical Multiple Regression models was used to examine the relationship between eight independent variables and one dependent variable to isolate predictors which have significant influence on behavior and sexual practices. Methods: A Cross-sectional design was used for the study. Structured close-ended interviewer-administered questionnaire was used to collect primary data. Multistage stratified technique was used to sample views from 380 students from Takoradi Polytechnic, Ghana. A Hierarchical multiple regression model was used to ascertain the significance of certain predictors of sexual behavior and practices. Results: The variables that were extracted from the multiple regression were; for the constant; β=14.202, t=2.279, p=0.023, variable is significant; for the marital status; β=0.092, t=1.996, p<0.05, variable is significant; for the knowledge on AIDs; β= 0.090, t=1.996, p<0.05, variable is significant; for the attitude towards HIV/AIDs; β=0.486, t=10.575, p<0.001, variable is highly significant. Thus, the best fitting model for predicting behavior and sexual practices was a linear combination of the constant, one’s marital status, knowledge on HIV/AIDs and Attitude towards HIV/AIDs., Y (Behavior and sexual practices) = β0 + β1 (Marital status) + β2 (Knowledge on HIV AIDs issues) + β3 (Attitude towards HIV AIDs issues) β0, β1, β2 and β3 are respectively 14.201, 2.038, 0.148 and 0.486; the higher the better. Conclusions: Attitude and behavior change education on HIV/AIDs should be intensified in the institution so that students could adopt better lifestyles. PMID:25946917
Model-Based Evaluation of Spontaneous Tumor Regression in Pilocytic Astrocytoma
Buder, Thomas; Deutsch, Andreas; Klink, Barbara; Voss-Böhme, Anja
2015-01-01
Pilocytic astrocytoma (PA) is the most common brain tumor in children. This tumor is usually benign and has a good prognosis. Total resection is the treatment of choice and will cure the majority of patients. However, often only partial resection is possible due to the location of the tumor. In that case, spontaneous regression, regrowth, or progression to a more aggressive form have been observed. The dependency between the residual tumor size and spontaneous regression is not understood yet. Therefore, the prognosis is largely unpredictable and there is controversy regarding the management of patients for whom complete resection cannot be achieved. Strategies span from pure observation (wait and see) to combinations of surgery, adjuvant chemotherapy, and radiotherapy. Here, we introduce a mathematical model to investigate the growth and progression behavior of PA. In particular, we propose a Markov chain model incorporating cell proliferation and death as well as mutations. Our model analysis shows that the tumor behavior after partial resection is essentially determined by a risk coefficient γ, which can be deduced from epidemiological data about PA. Our results quantitatively predict the regression probability of a partially resected benign PA given the residual tumor size and lead to the hypothesis that this dependency is linear, implying that removing any amount of tumor mass will improve prognosis. This finding stands in contrast to diffuse malignant glioma where an extent of resection threshold has been experimentally shown, below which no benefit for survival is expected. These results have important implications for future therapeutic studies in PA that should include residual tumor volume as a prognostic factor. PMID:26658166
Regression Models for Aquifer Vulnerability to Nitrate Pollution in Osona (NE Spain)
NASA Astrophysics Data System (ADS)
Boy Roura, M.; Nolan, B. T.; Menció Domingo, A.; Mas-Pla, J.
2012-12-01
Regression models were developed at a local scale in the Osona region (1,260 square kilometers) to predict nitrate concentrations in groundwater. Osona is a semi-arid region in northeast Spain, where livestock and agricultural activities are very intensive, and therefore, it is vulnerable to nitrate pollution from agricultural sources (European Nitrate Directive (91/676/EEC)). Nitrate concentrations in groundwater are commonly above 50 mg/L as nitrate, reaching up to 500 mg/L in some of the sampled wells. Regression models were based on explanatory variables such as geology, land use, and nitrogen inputs, which control the fate, transport and attenuation of nitrate in groundwater. Regression has been widely used to determine aquifer vulnerability to nitrate in groundwater at large spatial scales. We developed models with and without site-specific groundwater chemistry data to see the extent to which the latter improved the models. Although chemistry data could explain additional variation in groundwater nitrate concentration, such data were available only at the well locations and therefore were less amenable for spatial extrapolation. The data set consisted of nitrate data from 63 sampled wells and the following explanatory variables: 1) soils data consisting of texture and other physical properties; 2) geology indicating presence or absence of aquifers in the region, and their type (unconfined, leaky or confined); 3) land use (agricultural, urban, forested); 4) nitrogen input as manure; 5) occurrence of irrigated crops; 6) estimates of nitrogen uptake developed for 10 different crops; 7) slope; 8) population density, and 9) groundwater chemistry data comprising major ions and trace elements. Variables 1 and 2 were compiled as point data because their polygons were much larger than the well buffers which represented contributing areas to the sampled wells. Variables 3 to 8 were compiled within a 500-meter radius buffer around wells using a GIS-based weighted
Stock price forecasting using secondary self-regression model and wavelet neural networks
NASA Astrophysics Data System (ADS)
Yang, Chi-I.; Wang, Kai-Cheng; Chang, Kuei-Fang
2015-07-01
We have established a DWT-based secondary self-regression model (AR(2)) to forecast stock value. This method requires the user to decide upon the trend of the stock prices. We later used WNN to forecast stock prices which does not require the user to decide upon the trend. When comparing these two methods, we could see that AR(2) does not perform as well if there are no trends for the stock prices. On the other hand, WNN would not be influenced by the presence of trends.
Yang, Aileen; Hoek, Gerard; Montagne, Denise; Leseman, Daan L A C; Hellack, Bryan; Kuhlbusch, Thomas A J; Cassee, Flemming R; Brunekreef, Bert; Janssen, Nicole A H
2015-07-01
Oxidative potential (OP) of ambient particulate matter (PM) has been suggested as a health-relevant exposure metric. In order to use OP for exposure assessment, information is needed about how well central site OP measurements and modeled average OP at the home address reflect temporal and spatial variation of personal OP. We collected 96-hour personal, home outdoor and indoor PM2.5 samples from 15 volunteers living either at traffic, urban or regional background locations in Utrecht, the Netherlands. OP was also measured at one central reference site to account for temporal variations. OP was assessed using electron spin resonance (OP(ESR)) and dithiothreitol (OP(DTT)). Spatial variation of average OP at the home address was modeled using land use regression (LUR) models. For both OP(ESR) and OP(DTT), temporal correlations of central site measurements with home outdoor measurements were high (R>0.75), and moderate to high (R=0.49-0.70) with personal measurements. The LUR model predictions for OP correlated significantly with the home outdoor concentrations for OP(DTT) and OP(ESR) (R=0.65 and 0.62, respectively). LUR model predictions were moderately correlated with personal OP(DTT) measurements (R=0.50). Adjustment for indoor sources, such as vacuum cleaning and absence of fume-hood, improved the temporal and spatial agreement with measured personal exposure for OP(ESR). OP(DTT) was not associated with any indoor sources. Our study results support the use of central site OP for exposure assessment of epidemiological studies focusing on short-term health effects. PMID:25942578
Gaussian functional regression for output prediction: Model assimilation and experimental design
NASA Astrophysics Data System (ADS)
Nguyen, N. C.; Peraire, J.
2016-03-01
In this paper, we introduce a Gaussian functional regression (GFR) technique that integrates multi-fidelity models with model reduction to efficiently predict the input-output relationship of a high-fidelity model. The GFR method combines the high-fidelity model with a low-fidelity model to provide an estimate of the output of the high-fidelity model in the form of a posterior distribution that can characterize uncertainty in the prediction. A reduced basis approximation is constructed upon the low-fidelity model and incorporated into the GFR method to yield an inexpensive posterior distribution of the output estimate. As this posterior distribution depends crucially on a set of training inputs at which the high-fidelity models are simulated, we develop a greedy sampling algorithm to select the training inputs. Our approach results in an output prediction model that inherits the fidelity of the high-fidelity model and has the computational complexity of the reduced basis approximation. Numerical results are presented to demonstrate the proposed approach.
Linear regression model for predicting interactive mixture toxicity of pesticide and ionic liquid.
Qin, Li-Tang; Wu, Jie; Mo, Ling-Yun; Zeng, Hong-Hu; Liang, Yan-Peng
2015-08-01
The nature of most environmental contaminants comes from chemical mixtures rather than from individual chemicals. Most of the existed mixture models are only valid for non-interactive mixture toxicity. Therefore, we built two simple linear regression-based concentration addition (LCA) and independent action (LIA) models that aim to predict the combined toxicities of the interactive mixture. The LCA model was built between the negative log-transformation of experimental and expected effect concentrations of concentration addition (CA), while the LIA model was developed between the negative log-transformation of experimental and expected effect concentrations of independent action (IA). Twenty-four mixtures of pesticide and ionic liquid were used to evaluate the predictive abilities of LCA and LIA models. The models correlated well with the observed responses of the 24 binary mixtures. The values of the coefficient of determination (R (2)) and leave-one-out (LOO) cross-validated correlation coefficient (Q(2)) for LCA and LIA models are larger than 0.99, which indicates high predictive powers of the models. The results showed that the developed LCA and LIA models allow for accurately predicting the mixture toxicities of synergism, additive effect, and antagonism. The proposed LCA and LIA models may serve as a useful tool in ecotoxicological assessment. PMID:25929456
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan
2016-05-01
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders. PMID:26946026
Hema, M; Srinivasan, K
2011-07-01
Nickel removal efficiency of powered activated carbons of coconut oilcake, neem oilcake and commercial carbon was investigated by using artificial neural network. The effective parameters for the removal of nickel (%R) by adsorption process, which included the pH, contact time (T), distinctiveness of activated carbon (Cn), amount of activated carbon (Cw) and initial concentration of nickel (Co) were investigated. Levenberg-Marquardt (LM) Back-propagation algorithm is used to train the network. The network topology was optimized by varying number of hidden layer and number of neurons in hidden layer. The model was developed in terms of training; validation and testing of experimental data, the test subsets that each of them contains 60%, 20% and 20% of total experimental data, respectively. Multiple regression equation was developed for nickel adsorption system and the output was compared with both simulated and experimental outputs. Standard deviation (SD) with respect to experimental output was quite higher in the case of regression model when compared with ANN model. The obtained experimental data best fitted with the artificial neural network. PMID:23029923
Ayuso, Mercedes; Bermúdez, Lluís; Santolino, Miguel
2016-04-01
The analysis of factors influencing the severity of the personal injuries suffered by victims of motor accidents is an issue of major interest. Yet, most of the extant literature has tended to address this question by focusing on either the severity of temporary disability or the severity of permanent injury. In this paper, a bivariate copula-based regression model for temporary disability and permanent injury severities is introduced for the joint analysis of the relationship with the set of factors that might influence both categories of injury. Using a motor insurance database with 21,361 observations, the copula-based regression model is shown to give a better performance than that of a model based on the assumption of independence. The inclusion of the dependence structure in the analysis has a higher impact on the variance estimates of the injury severities than it does on the point estimates. By taking into account the dependence between temporary and permanent severities a more extensive factor analysis can be conducted. We illustrate that the conditional distribution functions of injury severities may be estimated, thus, providing decision makers with valuable information. PMID:26871615
Deng, Weiping; Chen, Hanfeng; Li, Zhaohai
2006-01-01
Often in genetic research, presence or absence of a disease is affected by not only the trait locus genotypes but also some covariates. The finite logistic regression mixture models and the methods under the models are developed for detection of a binary trait locus (BTL) through an interval-mapping procedure. The maximum-likelihood estimates (MLEs) of the logistic regression parameters are asymptotically unbiased. The null asymptotic distributions of the likelihood-ratio test (LRT) statistics for detection of a BTL are found to be given by the supremum of a χ2-process. The limiting null distributions are free of the null model parameters and are determined explicitly through only four (backcross case) or nine (intercross case) independent standard normal random variables. Therefore a threshold for detecting a BTL in a flanking marker interval can be approximated easily by using a Monte Carlo method. It is pointed out that use of a threshold incorrectly determined by reading off a χ2-probability table can result in an excessive false BTL detection rate much more severely than many researchers might anticipate. Simulation results show that the BTL detection procedures based on the thresholds determined by the limiting distributions perform quite well when the sample sizes are moderately large. PMID:16272416
Single-step genomic evaluation using multitrait random regression model and test-day data.
Koivula, M; Strandén, I; Pösö, J; Aamand, G P; Mäntysaari, E A
2015-04-01
The objectives of this study were to evaluate the feasibility of use of the test-day (TD) single-step genomic BLUP (ssGBLUP) using phenotypic records of Nordic Red Dairy cows. The critical point in ssGBLUP is how genomically derived relationships (G) are integrated with population-based pedigree relationships (A) into a combined relationship matrix (H). Therefore, we also tested how different weights for genomic and pedigree relationships affect ssGBLUP, validation reliability, and validation regression coefficients. Deregressed proofs for 305-d milk, protein, and fat yields were used for a posteriori validation. The results showed that the use of phenotypic TD records in ssGBLUP is feasible. Moreover, the TD ssGBLUP model gave considerably higher validation reliabilities and validation regression coefficients than the TD model without genomic information. No significant differences were found in validation reliability between the different TD ssGBLUP models according to bootstrap confidence intervals. However, the degree of inflation in genomic enhanced breeding values is affected by the method used in construction of the H matrix. The results showed that ssGBLUP provides a good alternative to the currently used multi-step approach but there is a great need to find the best option to combine pedigree and genomic information in the genomic matrix. PMID:25660739
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Brakebill, J.W.; Preston, S.D.
2003-01-01
The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived
Comparison of universal kriging and regression tree modelling for soil property mapping
NASA Astrophysics Data System (ADS)
Kempen, Bas
2013-04-01
Geostatistical modelling approaches have been dominating the field of digital soil mapping (DSM) since its inception in the early 1980s. In recent years, however, machine learning methods such as classification and regression trees, random forests, and neural networks have quickly gained popularity among researchers in the DSM community. The increased use of these methods has largely gone at the cost of geostatistical approaches. Despite the apparent shift in the application of DSM methods from geostatistics to machine learning, quantitative comparisons of the prediction performance of these methods are largely lacking. The aims of this research, therefore, are: i) to map two soil properties (topsoil organic matter content and thickness of the peat layer in the soil profile) using regression tree (RT) modelling and universal kriging (UK), and ii) to compare the prediction performance of these methods with independent data obtained by probability sampling. Using such data for validation does not only yield a statistically valid and unbiased estimates of the map accuracy, but it also allows a statistical comparison of the accuracies of the maps generated by the two methods. The topsoil organic matter content and the thickness of the peat layer were mapped for a 14,000 ha area in the province of Drenthe, The Netherlands. The calibration dataset contained soil property observations at 1,715 sites. The covariates used include layers derived from soil and paleogeography maps, land cover, relative elevation, drainage class, land reclamation period, elevation change, and historic land use. The validation dataset contained 125 observations selected by stratified simple random sampling of the study area. The root mean squared error (RMSE) of the soil organic matter map obtained by RT modelling was 0.603 log(%), that of the map obtained by UK 0.595 log(%). The difference in map accuracy was not significant (p = 0.377). The RMSE of the peat thickness map obtained by RT
NASA Astrophysics Data System (ADS)
Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel
2012-09-01
Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.
Burger, Divan Aristo; Schall, Robert
2015-01-01
Trials of the early bactericidal activity (EBA) of tuberculosis (TB) treatments assess the decline, during the first few days to weeks of treatment, in colony forming unit (CFU) count of Mycobacterium tuberculosis in the sputum of patients with smear-microscopy-positive pulmonary TB. Profiles over time of CFU data have conventionally been modeled using linear, bilinear, or bi-exponential regression. We propose a new biphasic nonlinear regression model for CFU data that comprises linear and bilinear regression models as special cases and is more flexible than bi-exponential regression models. A Bayesian nonlinear mixed-effects (NLME) regression model is fitted jointly to the data of all patients from a trial, and statistical inference about the mean EBA of TB treatments is based on the Bayesian NLME regression model. The posterior predictive distribution of relevant slope parameters of the Bayesian NLME regression model provides insight into the nature of the EBA of TB treatments; specifically, the posterior predictive distribution allows one to judge whether treatments are associated with monolinear or bilinear decline of log(CFU) count, and whether CFU count initially decreases fast, followed by a slower rate of decrease, or vice versa. PMID:25322214
Modeling of an Adjustable Beam Solid State Light Project
NASA Technical Reports Server (NTRS)
Clark, Toni
2015-01-01
This proposal is for the development of a computational model of a prototype variable beam light source using optical modeling software, Zemax Optics Studio. The variable beam light source would be designed to generate flood, spot, and directional beam patterns, while maintaining the same average power usage. The optical model would demonstrate the possibility of such a light source and its ability to address several issues: commonality of design, human task variability, and light source design process improvements. An adaptive lighting solution that utilizes the same electronics footprint and power constraints while addressing variability of lighting needed for the range of exploration tasks can save costs and allow for the development of common avionics for lighting controls.
Daniels, Bryan C.; Nemenman, Ilya
2015-01-01
The nonlinearity of dynamics in systems biology makes it hard to infer them from experimental data. Simple linear models are computationally efficient, but cannot incorporate these important nonlinearities. An adaptive method based on the S-system formalism, which is a sensible representation of nonlinear mass-action kinetics typically found in cellular dynamics, maintains the efficiency of linear regression. We combine this approach with adaptive model selection to obtain efficient and parsimonious representations of cellular dynamics. The approach is tested by inferring the dynamics of yeast glycolysis from simulated data. With little computing time, it produces dynamical models with high predictive power and with structural complexity adapted to the difficulty of the inference problem. PMID:25806510
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes
Liverani, Silvia; Hastie, David I.; Azizi, Lamiae; Papathomas, Michail; Richardson, Sylvia
2016-01-01
PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. PMID:27307779
Support of total maximum daily load programs using spatially referenced regression models
McMahon, G.; Alexander, R.B.; Qian, S.
2003-01-01
The spatially referenced regressions on watershed attributes modeling approach, as applied to predictions of total nitrogen flux in three North Carolina river basins, addresses several information needs identified by a National Research Council evaluation of the total maximum daily load program. The model provides reach-level predictions of the probability of exceeding water-quality criteria, and estimates of total nitrogen budgets. Model estimates of point- and diffuse-source contributions and nitrogen loss rates in streams and reservoirs compared moderately well with literature estimates. Maps of reach-level predictions of nutrient inputs and delivery provide an intuitive and spatially detailed summary of the origins and fate of nutrients within a basin.
Using regression heteroscedasticity to model trends in the mean and variance of floods
NASA Astrophysics Data System (ADS)
Hecht, Jory; Vogel, Richard
2015-04-01
Changes in the frequency of extreme floods have been observed and anticipated in many hydrological settings in response to numerous drivers of environmental change, including climate, land cover, and infrastructure. To help decision-makers design flood control infrastructure in settings with non-stationary hydrological regimes, a parsimonious approach for detecting and modeling trends in extreme floods is needed. An approach using ordinary least squares (OLS) to fit a heteroscedastic regression model can accommodate nonstationarity in both the mean and variance of flood series while simultaneously offering a means of (i) analytically evaluating type I and type II trend detection errors, (ii) analytically generating expressions of uncertainty, such as confidence and prediction intervals, (iii) providing updated estimates of the frequency of floods exceeding the flood of record, (iv) accommodating a wide range of non-linear functions through ladder of powers transformations, and (v) communicating hydrological changes in a single graphical image. Previous research has shown that the two-parameter lognormal distribution can adequately model the annual maximum flood distribution of both stationary and non-stationary hydrological regimes in many regions of the United States. A simple logarithmic transformation of annual maximum flood series enables an OLS heteroscedastic regression modeling approach to be especially suitable for creating a non-stationary flood frequency distribution with parameters that are conditional upon time or physically meaningful covariates. While heteroscedasticity is often viewed as an impediment, we document how detecting and modeling heteroscedasticity presents an opportunity for characterizing both the conditional mean and variance of annual maximum floods. We introduce an approach through which variance trend models can be analytically derived from the behavior of residuals of the conditional mean flood model. Through case studies of
Circumplex and Spherical Models for Child School Adjustment and Competence.
ERIC Educational Resources Information Center
Schaefer, Earl S.; Edgerton, Marianna
The goal of this study is to broaden the scope of a conceptual model for child behavior by analyzing constructs relevant to cognition, conation, and affect. Two samples were drawn from school populations. For the first sample, 28 teachers from 8 rural, suburban, and urban schools rated 193 kindergarten children. Each teacher rated up to eight…
A General Linear Model Approach to Adjusting the Cumulative GPA.
ERIC Educational Resources Information Center
Young, John W.
A general linear model (GLM), using least-squares techniques, was used to develop a criterion measure to replace freshman year grade point average (GPA) in college admission predictive validity studies. Problems with the use of GPA include those associated with the combination of grades from different courses and disciplines into a single measure,…
Modeling Group Size and Scalar Stress by Logistic Regression from an Archaeological Perspective
Alberti, Gianmarco
2014-01-01
Johnson’s scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups’ size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson’s theory and on Dunbar’s findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122–132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147–170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion. PMID:24626241
Development of a charge adjustment model for cardiac catheterization.
Brennan, Andrew; Gauvreau, Kimberlee; Connor, Jean; O'Connell, Cheryl; David, Sthuthi; Almodovar, Melvin; DiNardo, James; Banka, Puja; Mayer, John E; Marshall, Audrey C; Bergersen, Lisa
2015-02-01
A methodology that would allow for comparison of charges across institutions has not been developed for catheterization in congenital heart disease. A single institution catheterization database with prospectively collected case characteristics was linked to hospital charges related and limited to an episode of care in the catheterization laboratory for fiscal years 2008-2010. Catheterization charge categories (CCC) were developed to group types of catheterization procedures using a combination of empiric data and expert consensus. A multivariable model with outcome charges was created using CCC and additional patient and procedural characteristics. In 3 fiscal years, 3,839 cases were available for analysis. Forty catheterization procedure types were categorized into 7 CCC yielding a grouper variable with an R (2) explanatory value of 72.6%. In the final CCC, the largest proportion of cases was in CCC 2 (34%), which included diagnostic cases without intervention. Biopsy cases were isolated in CCC 1 (12%), and percutaneous pulmonary valve placement alone made up CCC 7 (2%). The final model included CCC, number of interventions, and cardiac diagnosis (R (2) = 74.2%). Additionally, current financial metrics such as APR-DRG severity of illness and case mix index demonstrated a lack of correlation with CCC. We have developed a catheterization procedure type financial grouper that accounts for the diverse case population encountered in catheterization for congenital heart disease. CCC and our multivariable model could be used to understand financial characteristics of a population at a single point in time, longitudinally, and to compare populations. PMID:25113520
Zhao, Hongya; Logothetis, Christopher J.; Gorlov, Ivan P.; Zeng, Jia; Dai, Jianguo
2013-01-01
Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression. PMID:24367394
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
[Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].
Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng
2016-02-15
With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment. PMID:27363125
Belfatto, Antonella; Riboldi, Marco; Ciardo, Delia; Cattani, Federica; Cecconi, Agnese; Lazzari, Roberta; Jereczek-Fossa, Barbara Alicja; Orecchia, Roberto; Baroni, Guido; Cerveri, Pietro
2016-03-01
This paper describes a patient-specific mathematical model to predict the evolution of uterine cervical tumors at a macroscopic scale, during fractionated external radiotherapy. The model provides estimates of tumor regrowth and dead-cell reabsorption, incorporating the interplay between tumor regression rate and radiosensitivity, as a function of the tumor oxygenation level. Model parameters were estimated by minimizing the difference between predicted and measured tumor volumes, these latter being obtained from a set of 154 serial cone-beam computed tomography scans acquired on 16 patients along the course of the therapy. The model stratified patients according to two different estimated dynamics of dead-cell removal and to the predicted initial value of the tumor oxygenation. The comparison with a simpler model demonstrated an improvement in fitting properties of this approach (fitting error average value <5%, p < 0.01), especially in case of tumor late responses, which can hardly be handled by models entailing a constant radiosensitivity, failing to model changes from initial severe hypoxia to aerobic conditions during the treatment course. The model predictive capabilities suggest the need of clustering patients accounting for cancer cell line, tumor staging, as well as microenvironment conditions (e.g., oxygenation level). PMID:25647734
Regression model for estimating inactivation of microbial aerosols by solar radiation.
Ben-David, Avishai; Sagripanti, Jose-Luis
2013-01-01
The inactivation of pathogenic aerosols by solar radiation is relevant to public health and biodefense. We investigated whether a relatively simple method to calculate solar diffuse and total irradiances could be developed and used in environmental photobiology estimations instead of complex atmospheric radiative transfer computer programs. The second-order regression model that we developed reproduced 13 radiation quantities calculated for equinoxes and solstices at 35(°) latitude with a computer-intensive and rather complex atmospheric radiative transfer program (MODTRAN) with a mean error <6% (2% for most radiation quantities). Extending the application of the regression model from a reference latitude and date (chosen as 35° latitude for 21 March) to different latitudes and days of the year was accomplished with variable success: usually with a mean error <15% (but as high as 150% for some combination of latitudes and days of year). This accuracy of the methodology proposed here compares favorably to photobiological experiments where the microbial survival is usually measured with an accuracy no better than ±0.5 log10 units. The approach and equations presented in this study should assist in estimating the maximum time during which microbial pathogens remain infectious after accidental or intentional aerosolization in open environments. PMID:23445252
NASA Astrophysics Data System (ADS)
Somayasa, Wayan
2016-02-01
We derive a functional central limit theorem for heteroscedastic spatial regressions by applying the generalized version of Prohorov's theorem. By our technique we get the limit process which is expressed as a function of a centered set-indexed Gaussian process including the standard set-indexed Brownian sheet as a special case. The result can be used to approximate the distributions of a type of Kolmogorov-Smirnov (KS) and Cramér-von Mises (CvM) functionals of the set-indexed partial sums (Cumulative Sum) processes of the least squares residuals which are useful for testing the adequateness of an assumed regression model. A simulation study is performed to investigate the finite sample sizes behavior of the tests. It is shown by simulation that among the two tests, the CvM test tends to have better power than KS test for testing first-order model against nonparametric or parametric alternative. An application of the established method in real data is also discussed.
Technology diffusion in hospitals: a log odds random effects regression model.
Blank, Jos L T; Valdmanis, Vivian G
2015-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the diffusion of technologies. We distinguish a number of determinants, such as service, physician, and environmental, financial and organizational characteristics of the 60 Dutch hospitals in our sample. On the basis of this data set on Dutch general hospitals over the period 1995-2002, we conclude that there is a relation between a number of determinants and the diffusion of innovations underlining conclusions from earlier research. Positive effects were found on the basis of the size of the hospitals, competition and a hospital's commitment to innovation. It appears that if a policy is developed to further diffuse innovations, the external effects of demand and market competition need to be examined, which would de facto lead to an efficient use of technology. For the individual hospital, instituting an innovations office appears to be the most prudent course of action. PMID:24323484
NASA Astrophysics Data System (ADS)
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; Reeves, Geoffrey D.; Clilverd, Mark
2016-04-01
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the prediction of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). A path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current (Dst), AE, and wave activity.
Reliability estimation for cutting tools based on logistic regression model using vibration signals
NASA Astrophysics Data System (ADS)
Chen, Baojia; Chen, Xuefeng; Li, Bing; He, Zhengjia; Cao, Hongrui; Cai, Gaigai
2011-10-01
As an important part of CNC machine, the reliability of cutting tools influences the whole manufacturing effectiveness and stability of equipment. The present study proposes a novel reliability estimation approach to the cutting tools based on logistic regression model by using vibration signals. The operation condition information of the CNC machine is incorporated into reliability analysis to reflect the product time-varying characteristics. The proposed approach is superior to other degradation estimation methods in that it does not necessitate any assumption about degradation paths and probability density functions of condition parameters. The three steps of new reliability estimation approach for cutting tools are as follows. First, on-line vibration signals of cutting tools are measured during the manufacturing process. Second, wavelet packet (WP) transform is employed to decompose the original signals and correlation analysis is employed to find out the feature frequency bands which indicate tool wear. Third, correlation analysis is also used to select the salient feature parameters which are composed of feature band energy, energy entropy and time-domain features. Finally, reliability estimation is carried out based on logistic regression model. The approach has been validated on a NC lathe. Under different failure threshold, the reliability and failure time of the cutting tools are all estimated accurately. The positive results show the plausibility and effectiveness of the proposed approach, which can facilitate machine performance and reliability estimation.
NASA Astrophysics Data System (ADS)
Liu, Pao-Wen Grace; Tsai, Jiun-Horng; Lai, Hsin-Chih; Tsai, Der-Min; Li, Li-Wei
2013-11-01
Sensitivity of meteorological variation to air quality has attracted people's attention since climate change became a world issue. The goal of this study is to investigate the sensitivity of ground-level ozone concentrations to temperature variation in Taiwan. Several multivariate regression models were built based on historical data of ozone and meteorological variables at three cities located in northern, mid-western, and southern Taiwan. Results of descriptive statistics indicate that the severe pollution from the highest to the minor conditions following by the order of the southern (Pingtung), mid-western (Fengyuan), and the northern sites (Hsichih). Multiple regression models containing a principal component trigger variable effectively simulated the historical ozone exceedance during 2004-2009. Inclusion of the PC trigger were improved R2 from the lowest 0.38 to the highest 0.58. High probability of detection and critical success index (mostly between 85% and 90%) and low false alarm rates (0-2.6%) were achieved for predicting the high ozone days (≧100 ppb). The results of sensitivity analysis indicated that (1) the ozone sensitivity was positively correlated with the temperature variation, (2) the sensitivity levels were opposite to that of the ozone problem severity, (3) the sensitivity was mostly apparent in ozone seasons, and (4) the sensitivity strongly depended on the seasonality in the urban cities Hischih and Fengyuan, but weakly depended on seasonality in the rural city Pingtung.
Development of a land-use regression model for ultrafine particles in Toronto, Canada
NASA Astrophysics Data System (ADS)
Sabaliauskas, Kelly; Jeong, Cheol-Heon; Yao, Xiaohong; Reali, Christopher; Sun, Tim; Evans, Greg J.
2015-06-01
This study applies land-use regression (LUR) to characterize the spatial distribution of ultrafine particles (UFP) in a large city. Particle number (PN) concentrations were measured in residential areas around Toronto, Canada, between June and August 2008. A combination of fixed and mobile monitoring was used to assess spatial gradients between and within communities. The fixed monitoring locations included a central site, two downtown sites, and four residential sites located 6-15 km from the downtown core. The mobile data included average PN concentrations collected on 112 road segments from 10 study routes that were repeated on three separate days. The mobile data was used to create the land-use regression model while the fixed sites were used for validation purposes. The predictor variables that best described the spatial variation of PN concentration (R2 = 0.72, validated R2 = 0.68) included population density within 300 m, total resource and industrial area within 1000 m, total residential area within 3000 m, and major roadway and highway length within 3000 m. The LUR model successfully predicted the afternoon peak PN concentration (slope = 0.96, R2 = 0.86) but over-predicted the 24-h average PN concentration (slope = 1.28, R2 = 0.72) measured at seven fixed monitoring sites.
Knight, Rodney R.; Gain, W. Scott; Wolfe, William J.
2011-01-01
Predictive equations were developed using stepbackward regression for 19 ecologically relevant streamflow characteristics grouped in five major classes (magnitude, ratio, frequency, variability, and date) for use in the Tennessee and Cumberland River watersheds. Basin characteristics explain 50 percent or more of the variation for 10 of the 19 equations. Independent variables identified through stepbackward regression were statistically significant in 81 of 304 coefficients tested across 19 models (⬚ < 0.0001) and represent four major groups: climate, physical landscape features, regional indicators, and land use. The most influential variables for determining hydrologic response were in the land-use and climate groups: daily temperature range, percent agricultural land use, and monthly mean precipitation. These three variables were major explanatory factors in 17, 15, and 13 models, respectively. The equations and independent datasets were used to explore the broad relation between basin properties and streamflow and its implications for the study of ecological flow requirements. Key results include a high degree of hydrologic variability among least disturbed Blue Ridge streams, similar hydrologic behavior for watersheds with widely varying degrees of forest cover, and distinct hydrologic profiles for streams in different geographic regions.
Combining regression analysis and air quality modelling to predict benzene concentration levels
NASA Astrophysics Data System (ADS)
Vlachokostas, Ch.; Achillas, Ch.; Chourdakis, E.; Moussiopoulos, N.
2011-05-01
State of the art epidemiological research has found consistent associations between traffic-related air pollution and various outcomes, such as respiratory symptoms and premature mortality. However, many urban areas are characterised by the absence of the necessary monitoring infrastructure, especially for benzene (C 6H 6), which is a known human carcinogen. The use of environmental statistics combined with air quality modelling can be of vital importance in order to assess air quality levels of traffic-related pollutants in an urban area in the case where there are no available measurements. This paper aims at developing and presenting a reliable approach, in order to forecast C 6H 6 levels in urban environments, demonstrated for Thessaloniki, Greece. Multiple stepwise regression analysis is used and a strong statistical relationship is detected between C 6H 6 and CO. The adopted regression model is validated in order to depict its applicability and representativeness. The presented results demonstrate that the adopted approach is capable of capturing C 6H 6 concentration trends and should be considered as complementary to air quality monitoring.
Error analysis of leaf area estimates made from allometric regression models
NASA Technical Reports Server (NTRS)
Feiveson, A. H.; Chhikara, R. S.
1986-01-01
Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).
Villain, Jonathan; Lozano, Sylvain; Halm-Lemeille, Marie-Pierre; Durrieu, Gilles; Bureau, Ronan
2014-12-01
The potential of quantile regression (QR) and quantile support vector machine regression (QSVMR) was analyzed for the definitions of quantitative structure-activity relationship (QSAR) models associated with a diverse set of chemicals toward a particular endpoint. This study focused on a specific sensitive endpoint (acute toxicity to algae) for which even a narcosis QSAR model is not actually clear. An initial dataset including more than 401 ecotoxicological data for one species of algae (Selenastrum capricornutum) was defined. This set corresponds to a large sample of chemicals ranging from classical organic chemicals to pesticides. From this original data set, the selection of the different subsets was made in terms of the notion of toxic ratio (TR), a parameter based on the ratio between predicted and experimental values. The robustness of QR and QSVMR to outliers was clearly observed, thus demonstrating that this approach represents a major interest for QSAR associated with a diverse set of chemicals. We focused particularly on descriptors related to molecular surface properties. PMID:25431186
Zhao, Lue Ping; Bolouri, Hamid
2016-04-01
Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015). PMID:26972839
A land use regression model incorporating data on industrial point source pollution.
Chen, Li; Wang, Yuming; Li, Peiwu; Ji, Yaqin; Kong, Shaofei; Li, Zhiyong; Bai, Zhipeng
2012-01-01
Advancing the understanding of the spatial aspects of air pollution in the city regional environment is an area where improved methods can be of great benefit to exposure assessment and policy support. We created land use regression (LUR) models for SO2, NO2 and PM10 for Tianjin, China. Traffic volumes, road networks, land use data, population density, meteorological conditions, physical conditions and satellite-derived greenness, brightness and wetness were used for predicting SO2, NO2 and PM10 concentrations. We incorporated data on industrial point sources to improve LUR model performance. In order to consider the impact of different sources, we calculated the PSIndex, LSIndex and area of different land use types (agricultural land, industrial land, commercial land, residential land, green space and water area) within different buffer radii (1 to 20 km). This method makes up for the lack of consideration of source impact based on the LUR model. Remote sensing-derived variables were significantly correlated with gaseous pollutant concentrations such as SO2 and NO2. R2 values of the multiple linear regression equations for SO2, NO2 and PM10 were 0.78, 0.89 and 0.84, respectively, and the RMSE values were 0.32, 0.18 and 0.21, respectively. Model predictions at validation monitoring sites went well with predictions generally within 15% of measured values. Compared to the relationship between dependent variables and simple variables (such as traffic variables or meteorological condition variables), the relationship between dependent variables and integrated variables was more consistent with a linear relationship. Such integration has a discernable influence on both the overall model prediction and health effects assessment on the spatial distribution of air pollution in the city region. PMID:23513446
Regression models for mixed discrete and continuous responses with potentially missing values.
Fitzmaurice, G M; Laird, N M
1997-03-01
In this paper a likelihood-based method for analyzing mixed discrete and continuous regression models is proposed. We focus on marginal regression models, that is, models in which the marginal expectation of the response vector is related to covariates by known link functions. The proposed model is based on an extension of the general location model of Olkin and Tate (1961, Annals of Mathematical Statistics 32, 448-465), and can accommodate missing responses. When there are no missing data, our particular choice of parameterization yields maximum likelihood estimates of the marginal mean parameters that are robust to misspecification of the association between the responses. This robustness property does not, in general, hold for the case of incomplete data. There are a number of potential benefits of a multivariate approach over separate analyses of the distinct responses. First, a multivariate analysis can exploit the correlation structure of the response vector to address intrinsically multivariate questions. Second, multivariate test statistics allow for control over the inflation of the type I error that results when separate analyses of the distinct responses are performed without accounting for multiple comparisons. Third, it is generally possible to obtain more precise parameter estimates by accounting for the association between the responses. Finally, separate analyses of the distinct responses may be difficult to interpret when there is nonresponse because different sets of individuals contribute to each analysis. Furthermore, separate analyses can introduce bias when the missing responses are missing at random (MAR). A multivariate analysis can circumvent both of these problems. The proposed methods are applied to two biomedical datasets. PMID:9147588
Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy
NASA Astrophysics Data System (ADS)
Wang, Liang-Jie; Sawada, Kazuhide; Moriguchi, Shuji
2013-08-01
Several mathematical models are used to predict the spatial distribution characteristics of landslides to mitigate damage caused by landslide disasters. Although some studies have achieved excellent results around the world, few studies take the inter-relationship of the selected points (training points) into account. In this paper, we present the Fuzzy c-means (FCM) algorithm as an optimal method for choosing the appropriate input landslide points as training data. Based on different combinations of the Fuzzy exponent (m) and the number of clusters (c), five groups of sampling points were derived from formal seed cells points and applied to analyze the landslide susceptibility in Mizunami City, Gifu Prefecture, Japan. A logistic regression model is applied to create the models of the relationships between landslide-conditioning factors and landslide occurrence. The pre-existing landslide bodies and the area under the relative operative characteristic (ROC) curve were used to evaluate the performance of all the models with different m and c. The results revealed that Model no. 4 (m=1.9, c=4) and Model no. 5 (m=1.9, c=5) have significantly high classification accuracies, i.e., 90.0%. Moreover, over 30% of the landslide bodies were grouped under the very high susceptibility zone. Otherwise, Model no. 4 and Model no. 5 had higher area under the ROC curve (AUC) values, which were 0.78 and 0.79, respectively. Therefore, Model no. 4 and Model no. 5 offer better model results for landslide susceptibility mapping. Maps derived from Model no. 4 and Model no. 5 would offer the local authorities crucial information for city planning and development.
Vansteelandt, Stijn; Rotnitzky, Andrea; Robins, James
2012-01-01
Summary We propose a new class of models for making inference about the mean of a vector of repeated outcomes when the outcome vector is incompletely observed in some study units and missingness is nonmonotone. Each model in our class is indexed by a set of unidentified selection bias functions which quantify the residual association of the outcome at each occasion t and the probability that this outcome is missing after adjusting for variables observed prior to time t and for the past nonresponse pattern. In particular, selection bias functions equal to zero encode the investigator’s a priori belief that nonresponse of the next outcome does not depend on that outcome after adjusting for the observed past. We call this assumption sequential explainability. Since each model in our class is nonparametric, it fits the data perfectly well. As such, our models are ideal for conducting sensitivity analyses aimed at evaluating the impact that different degrees of departure from sequential explainability have on inference about the marginal means of interest. Although the marginal means are identified under each of our models, their estimation is not feasible in practice because it requires the auxiliary estimation of conditional expectations and probabilities given high-dimensional variables. We henceforth discuss estimation of the marginal means under each model in our class assuming, additionally, that at each occasion either one of following two models holds: a parametric model for the conditional probability of nonresponse given current outcomes and past recorded data, or a parametric model for the conditional mean of the outcome on the nonrespondents given the past recorded data. We call the resulting procedure 2T-multiply robust as it protects at each of the T time points against misspecification of one of these two working models, although not against simultaneous misspecification of both. We extend our proposed class of models and estimators to incorporate data
Eberly, Lynn E
2007-01-01
This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050
NASA Astrophysics Data System (ADS)
Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.
2016-04-01
The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.
Huang, Yangxin; Liang, Hua; Wu, Hulin
2008-10-15
In this paper, the mechanism-based ordinary differential equation (ODE) model and the flexible semiparametric regression model are employed to identify the significant covariates for antiretroviral response in AIDS clinical trials. We consider the treatment effect as a function of three factors (or covariates) including pharmacokinetics, drug adherence and susceptibility. Both clinical and simulated data examples are given to illustrate these two different kinds of modeling approaches. We found that the ODE model is more powerful to model the mechanism-based nonlinear relationship between treatment effects and virological response biomarkers. The ODE model is also better in identifying the significant factors for virological response, although it is slightly liberal and there is a trend to include more factors (or covariates) in the model. The semiparametric mixed-effects regression model is very flexible to fit the virological response data, but it is too liberal to identify correct factors for the virological response; sometimes it may miss the correct factors. The ODE model is also biologically justifiable and good for predictions and simulations for various biological scenarios. The limitations of the ODE models include the high cost of computation and the requirement of biological assumptions that sometimes may not be easy to validate. The methodologies reviewed in this paper are also generally applicable to studies of other viruses such as hepatitis B virus or hepatitis C virus. PMID:18407583
Cuffney, T.F.; Kashuba, R.; Qian, S.S.; Alameddine, I.; Cha, Y.K.; Lee, B.; Coles, J.F.; McMahon, G.
2011-01-01
Multilevel hierarchical regression was used to examine regional patterns in the responses of benthic macroinvertebrates and algae to urbanization across 9 metropolitan areas of the conterminous USA. Linear regressions established that responses (intercepts and slopes) to urbanization of invertebrates and algae varied among metropolitan areas. Multilevel hierarchical regression models were able to explain these differences on the basis of region-scale predictors. Regional differences in the type of land cover (agriculture or forest) being converted to urban and climatic factors (precipitation and air temperature) accounted for the differences in the response of macroinvertebrates to urbanization based on ordination scores, total richness, Ephemeroptera, Plecoptera, Trichoptera richness, and average tolerance. Regional differences in climate and antecedent agriculture also accounted for differences in the responses of salt-tolerant diatoms, but differences in the responses of other diatom metrics (% eutraphenic, % sensitive, and % silt tolerant) were best explained by regional differences in soils (mean % clay soils). The effects of urbanization were most readily detected in regions where forest lands were being converted to urban land because agricultural development significantly degraded assemblages before urbanization and made detection of urban effects difficult. The effects of climatic factors (temperature, precipitation) on background conditions (biogeographic differences) and rates of response to urbanization were most apparent after accounting for the effects of agricultural development. The effects of climate and land cover on responses to urbanization provide strong evidence that monitoring, mitigation, and restoration efforts must be tailored for specific regions and that attainment goals (background conditions) may not be possible in regions with high levels of prior disturbance (e.g., agricultural development). ?? 2011 by The North American
Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model
Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping
2016-01-01
Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking. PMID:27498888
Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model.
Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping
2016-01-01
Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking. PMID:27498888
Ghouri, A F; Zamora, R L; Sessions, D G; Spitznagel, E L; Harvey, J E
1994-10-01
The ability to accurately predict the presence of subclinical metastatic neck disease in clinically N0 patients with primary epidermoid cancer of the larynx would be of great value in determining whether to perform an elective neck dissection. We describe a statistical approach to estimating the probability of occult neck disease given pretreatment clinical parameters. A retrospective study was performed involving 736 clinically N0 patients with primary laryngeal cancer who were treated surgically with primary resection and ipsilateral neck dissection. Nodal involvement was determined histologically after surgical lymphadenectomy. A logistic regression model was used to derive an equation that calculated the probability of occult neck metastasis based on pretreatment T stage, tumor location, and histologic grade. The model has a sensitivity of 74%, a specificity of 87%, and can be entered into a programmable calculator. PMID:7934602
Wang, Xiaojing; Chen, Ming-Hui; Yan, Jun
2013-07-01
Cox models with time-varying coefficients offer great flexibility in capturing the temporal dynamics of covariate effects on event times, which could be hidden from a Cox proportional hazards model. Methodology development for varying coefficient Cox models, however, has been largely limited to right censored data; only limited work on interval censored data has been done. In most existing methods for varying coefficient models, analysts need to specify which covariate coefficients are time-varying and which are not at the time of fitting. We propose a dynamic Cox regression model for interval censored data in a Bayesian framework, where the coefficient curves are piecewise constant but the number of pieces and the jump points are covariate specific and estimated from the data. The model automatically determines the extent to which the temporal dynamics is needed for each covariate, resulting in smoother and more stable curve estimates. The posterior computation is carried out via an efficient reversible jump Markov chain Monte Carlo algorithm. Inference of each coefficient is based on an average of models with different number of pieces and jump points. A simulation study with three covariates, each with a coefficient of different degree in temporal dynamics, confirmed that the dynamic model is preferred to the existing time-varying model in terms of model comparison criteria through conditional predictive ordinate. When applied to a dental health data of children with age between 7 and 12 years, the dynamic model reveals that the relative risk of emergence of permanent tooth 24 between children with and without an infected primary predecessor is the highest at around age 7.5, and that it gradually reduces to one after age 11. These findings were not seen from the existing studies with Cox proportional hazards models. PMID:23389549
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
NASA Astrophysics Data System (ADS)
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
NASA Astrophysics Data System (ADS)
Zakaria, Siti Aisyah; Ismail, Wan Nurul Aiffah; Noor, Nor Fashihah Mohd; Ariffin, Wan Nor Munirah
2014-12-01
Liquid-liquid extraction is one of the most important separation processes. Different kinds of liquid-liquid extractor such as Rotating Disc Contactor (RDC) Column being used in industries. The study of liquid-liquid extraction in an RDC column has become a very important subject to be discussed not only amongst chemical engineers but mathematicians as well. Multiple Linear Regression (MLR) was applied to justify the relationship between the input variables and output variables. The input variables taken into considered include rotor speed (Nr); ratio of flow (Fd); concentration of continuous inlet (Ccin); concentration of dispersed inlet (Cdin); interaction between Nr with Fd; interaction between Nr with Ccin; interaction Nr with Cdin. Meanwhile the output variables are concentration of continuous outlet (Ccout) and concentration of dispersed outlet (Cdout) on RDC column performance. The regression model is applied to estimates the dependent variable outside the period used to fit the data. Therefore, we have two linear model represent two output of Ccout and Cdout. The results show that there is a positive relationship between the Ccout and Ccin, as well as Cdin and interaction Nr with Cdin. For the first model based on Ccout, the coefficient of Nr record the highest value, meaning that the rotor speed (Nr) has a great influence on the concentration of continuous outlet (Ccout). On the other result, there is a negative relationship between Cdout and interaction Nr with Cdin for the second model based on Cdout. The coefficient of Ccin record the highest value, meaning that the concentration of continuous inlet (Ccin) has a great influence on the concentration of dispersed outlet (Cdout).
NASA Astrophysics Data System (ADS)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.
Milly, P.C.D.; Dunne, K.A.
2011-01-01
Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median 211%) caused by the hydrologic model's apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen-Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors' findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climatechange impacts on water. Copyright ?? 2011, Paper 15-001; 35,952 words, 3 Figures, 0 Animations, 1 Tables.
A note on modeling of tumor regression for estimation of radiobiological parameters
Zhong, Hualiang Chetty, Indrin
2014-08-15
Purpose: Accurate calculation of radiobiological parameters is crucial to predicting radiation treatment response. Modeling differences may have a significant impact on derived parameters. In this study, the authors have integrated two existing models with kinetic differential equations to formulate a new tumor regression model for estimation of radiobiological parameters for individual patients. Methods: A system of differential equations that characterizes the birth-and-death process of tumor cells in radiation treatment was analytically solved. The solution of this system was used to construct an iterative model (Z-model). The model consists of three parameters: tumor doubling time T{sub d}, half-life of dead cells T{sub r}, and cell survival fraction SF{sub D} under dose D. The Jacobian determinant of this model was proposed as a constraint to optimize the three parameters for six head and neck cancer patients. The derived parameters were compared with those generated from the two existing models: Chvetsov's model (C-model) and Lim's model (L-model). The C-model and L-model were optimized with the parameter T{sub d} fixed. Results: With the Jacobian-constrained Z-model, the mean of the optimized cell survival fractions is 0.43 ± 0.08, and the half-life of dead cells averaged over the six patients is 17.5 ± 3.2 days. The parameters T{sub r} and SF{sub D} optimized with the Z-model differ by 1.2% and 20.3% from those optimized with the T{sub d}-fixed C-model, and by 32.1% and 112.3% from those optimized with the T{sub d}-fixed L-model, respectively. Conclusions: The Z-model was analytically constructed from the differential equations of cell populations that describe changes in the number of different tumor cells during the course of radiation treatment. The Jacobian constraints were proposed to optimize the three radiobiological parameters. The generated model and its optimization method may help develop high-quality treatment regimens for individual patients.
NASA Astrophysics Data System (ADS)
Zhang, L.; Wylie, B. K.; Fosnight, E. A.
2005-12-01
For better understanding the carbon fluxes in the grassland ecosystems, an empirical piecewise regression (PWR) model was developed to estimate gross primary production (GPP) for the grassland ecosystems in the Northern Great Plains and Northern Kazakhstan. The PWR model spatially scales up the localized flux tower measurements across an ecoregion at 1-km resolution. In this study, we compared the PWR GPP and the MODIS GPP with five grassland flux tower measurements. Then we employed cross-validation to evaluate the PWR GPP values. We also compared PWR GPP and MODIS GPP for grasslands for the entire study area. Factors that may explain the spatial pattern of the GPP differences between the two models were explored using decision tree technique. The results indicated that the PWR modeling approach was robust with a good agreement (agreement coefficient d=0.71-0.97) between PWR model and tower measurements. Cross-validation showed a relatively low agreement (d=0.71-0.78) at two influential flux tower sites. We also observed that the PWR GPP was lower than or similar to the MODIS GPP in the east and higher in the west and south. Further analysis suggested that percentage of C4 grasses, soil water holding capacity, percentage of clay, and percentage of crop mixed in the grassland contributed to the GPP difference of the PWR and MODIS models.
Regularization in finite mixture of regression models with diverging number of parameters.
Khalili, Abbas; Lin, Shili
2013-06-01
Feature (variable) selection has become a fundamentally important problem in recent statistical literature. Sometimes, in applications, many variables are introduced to reduce possible modeling biases, but the number of variables a model can accommodate is often limited by the amount of data available. In other words, the number of variables considered depends on the sample size, which reflects the estimability of the parametric model. In this article, we consider the problem of feature selection in finite mixture of regression models when the number of parameters in the model can increase with the sample size. We propose a penalized likelihood approach for feature selection in these models. Under certain regularity conditions, our approach leads to consistent variable selection. We carry out extensive simulation studies to evaluate the performance of the proposed approach under controlled settings. We also applied the proposed method to two real data. The first is on telemonitoring of Parkinson's disease (PD), where the problem concerns whether dysphonic features extracted from the patients' speech signals recorded at home can be used as surrogates to study PD severity and progression. The second is on breast cancer prognosis, in which one is interested in assessing whether cell nuclear features may offer prognostic values on long-term survival of breast cancer patients. Our analysis in each of the application revealed a mixture structure in the study population and uncovered a unique relationship between the features and the response variable in each of the mixture component. PMID:23556535
NASA Astrophysics Data System (ADS)
Ahangar-Asr, A.; Faramarzi, A.; Mottaghifard, N.; Javadi, A. A.
2011-11-01
This paper presents a new approach, based on evolutionary polynomial regression (EPR), for prediction of permeability ( K), maximum dry density (MDD), and optimum moisture content (OMC) as functions of some physical properties of soil. EPR is a data-driven method based on evolutionary computing aimed to search for polynomial structures representing a system. In this technique, a combination of the genetic algorithm (GA) and the least-squares method is used to find feasible structures and the appropriate parameters of those structures. EPR models are developed based on results from a series of classification, compaction, and permeability tests from the literature. The tests included standard Proctor tests, constant head permeability tests, and falling head permeability tests conducted on soils made of four components, bentonite, limestone dust, sand, and gravel, mixed in different proportions. The results of the EPR model predictions are compared with those of a neural network model, a correlation equation from the literature, and the experimental data. Comparison of the results shows that the proposed models are highly accurate and robust in predicting permeability and compaction characteristics of soils. Results from sensitivity analysis indicate that the models trained from experimental data have been able to capture many physical relationships between soil parameters. The proposed models are also able to represent the degree to which individual contributing parameters affect the maximum dry density, optimum moisture content, and permeability.
Spatial analysis studies have included application of land use regression models (LURs) for health and air quality assessments. Recent LUR studies have collected nitrogen dioxide (NO2) and volatile organic compounds (VOCs) using passive samplers at urban air monitoring networks ...
NASA Astrophysics Data System (ADS)
Eghnam, Karam M.; Sheta, Alaa F.
2008-06-01
Development of accurate models is necessary in critical applications such as prediction. In this paper, a solution to the stock prediction problem of the Barents Sea capelin is introduced using Artificial Neural Network (ANN) and Multiple Linear model Regression (MLR) models. The Capelin stock in the Barents Sea is one of the largest in the world. It normally maintained a fishery with annual catches of up to 3 million tons. The Capelin stock problem has an impact in the fish stock development. The proposed prediction model was developed using an ANNs with their weights adapted using Genetic Algorithm (GA). The proposed model was compared to traditional linear model the MLR. The results showed that the ANN-GA model produced an overall accuracy of 21% better than the MLR model.
The overlooked potential of Generalized Linear Models in astronomy, I: Binomial regression
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Cameron, E.; Killedar, M.; Hilbe, J.; Vilalta, R.; Maio, U.; Biffi, V.; Ciardi, B.; Riggs, J. D.
2015-09-01
Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific enquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods. In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper-the first in a series aimed at illustrating the power of these methods in astronomical applications-we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes from cosmological hydro-simulations including detailed chemistry, gas physics, and stellar feedback. We predict that for a dark mini-halo with metallicity ≈ 1.3 × 10-4Z⨀, an increase of 1.2 × 10-2 in the gas molecular fraction, increases the probability of star formation occurrence by a factor of 75%. Finally, we highlight the use of receiver operating characteristic curves as a diagnostic for binary classifiers, and ultimately we use these to demonstrate the competitive predictive performance of GLMs against the popular technique of artificial neural networks.
Proteomics Improves the Prediction of Burns Mortality: Results from Regression Spline Modeling
Finnerty, Celeste C.; Ju, Hyunsu; Spratt, Heidi; Victor, Sundar; Jeschke, Marc G.; Hegde, Sachin; Bhavnani, Suresh K.; Luxon, Bruce A.; Brasier, Allan R.; Herndon, David N.
2012-01-01
Prediction of mortality in severely burned patients remains unreliable. Although clinical covariates and plasma protein abundance have been used with varying degrees of success, the triad of burn size, inhalation injury, and age remains the most reliable predictor. We investigated the effect of combining proteomics variables with these three clinical covariates on prediction of mortality in burned children. Serum samples were collected from 330 burned children (burns covering >25% of the total body surface area) between admission and the time of the first operation for clinical chemistry analyses and proteomic assays of cytokines. Principal component analysis revealed that serum protein abundance and the clinical covariates each provided independent information regarding patient survival. To determine whether combining proteomics with clinical variables improves prediction of patient mortality, we used multivariate adaptive regression splines, since the relationships between analytes and mortality were not linear. Combining these factors increased overall outcome prediction accuracy from 52% to 81% and area under the receiver operating characteristic curve from 0.82 to 0.95. Thus, the predictive accuracy of burns mortality is substantially improved by combining protein abundance information with clinical covariates in a multivariate adaptive regression splines classifier, a model currently being validated in a prospective study. PMID:22686201
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Schmiege, Sarah J; Levin, Michael E; Bryan, Angela D
2009-12-01
Adolescents involved with the criminal justice system engage in high levels of both risky sexual behavior and alcohol use. Yet a strong relationship between the two constructs has not been consistently observed, possibly due to heterogeneity in the data. Regression mixture models were estimated in the current study to address such potential heterogeneity. Criminally-involved adolescents (n = 409) were clustered into latent classes based on patterns of the regression of two measures of risky sexual behavior, condom use and frequency of intercourse, on alcohol use. A three-class solution emerged where alcohol use did not significantly predict either risky sex outcome for approximately 25% of the sample; alcohol use negatively predicted condom use and positively predicted frequency of intercourse for approximately 38% of participants; and alcohol use negatively predicted condom use but not frequency of intercourse for the remaining participants. These classes were then distinguished on the basis of five covariates previously found to influence either alcohol use, risky sexual behavior, or the relationship between the two: self-esteem, gender, participant age, relationship status, and impulsivity/sensation-seeking. High self-esteem, being female, being older, and being in a relationship predicted membership in the class with no observed relationship of alcohol use to risky sex, relative to the other classes. Implications of the present findings are discussed in terms of exploring different risky sex and alcohol use patterns within criminally involved adolescents, as well as understanding the effectiveness of interventions for subgroups of individuals. PMID:19459047
A History of Regression and Related Model-Fitting in the Earth Sciences (1636?-2000)
Howarth, Richard J.
2001-12-15
its roots in meeting the evident need for improved estimators in spatial interpolation. Technical advances in regression analysis during the 1970s embraced the development of regression diagnostics and consequent attention to outliers; the recognition of problems caused by correlated predictors, and the subsequent introduction of ridge regression to overcome them; and techniques for fitting errors-in-variables and mixture models. Improvements in computational power have enabled ever more computer-intensive methods to be applied. These include algorithms which are robust in the presence of outliers, for example Rousseeuw's 1984 Least Median Squares; nonparametric smoothing methods, such as kernel-functions, splines and Cleveland's 1979 LOcally WEighted Scatterplot Smoother (LOWESS); and the Classification and Regression Tree (CART) technique of Breiman and others in 1984. Despite a continuing improvement in the rate of technology-transfer from the statistical to the earth-science community, despite an abrupt drop to a time-lag of about 10 years following the introduction of digital computers, these more recent developments are only just beginning to penetrate beyond the research community of earth scientists. Examples of applications to problem-solving in the earth sciences are given.
Milly, Paul C.; Dunne, Krista A.
2011-01-01
Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median -11%) caused by the hydrologic model’s apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen–Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors’ findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climate-change impacts on water.
Watts, Kenneth R.
1995-01-01
regression models. These models also include an autoregressive term to account for serial correlation in the residuals. The adjusted coefficient of determination (Ra2) for the 46 regression models range from 0.08 to 0.89, and the standard errors of estimate range from 0.034 to 2.483 feet. The regression models of monthly water- level change can be used to evaluate whether post-1985 monthly water-level change values at the selected observation wells are within the 95-percent confidence limits of predicted monthly water-level change.
NASA Astrophysics Data System (ADS)
Shu, Yuqin; Lam, Nina S. N.
2011-01-01
Detailed estimates of carbon dioxide emissions at fine spatial scales are critical to both modelers and decision makers dealing with global warming and climate change. Globally, traffic-related emissions of carbon dioxide are growing rapidly. This paper presents a new method based on a multiple linear regression model to disaggregate traffic-related CO 2 emission estimates from the parish-level scale to a 1 × 1 km grid scale. Considering the allocation factors (population density, urban area, income, road density) together, we used a correlation and regression analysis to determine the relationship between these factors and traffic-related CO 2 emissions, and developed the best-fit model. The method was applied to downscale the traffic-related CO 2 emission values by parish (i.e. county) for the State of Louisiana into 1-km 2 grid cells. In the four highest parishes in traffic-related CO 2 emissions, the biggest area that has above average CO 2 emissions is found in East Baton Rouge, and the smallest area with no CO 2 emissions is also in East Baton Rouge, but Orleans has the most CO 2 emissions per unit area. The result reveals that high CO 2 emissions are concentrated in dense road network of urban areas with high population density and low CO 2 emissions are distributed in rural areas with low population density, sparse road network. The proposed method can be used to identify the emission "hot spots" at fine scale and is considered more accurate and less time-consuming than the previous methods.
Block adjustment of Chang'E-1 images based on rational function model
NASA Astrophysics Data System (ADS)
Liu, Bin; Liu, Yiliang; Di, Kaichang; Sun, Xiliang
2014-05-01
Chang'E-1(CE-1) is the first lunar orbiter of China's lunar exploration program. The CCD camera carried by CE-1 has acquired stereo images covering the entire lunar surface. Block adjustment and 3D mapping using CE-1 images are of great importance for morphological and other scientific research of the Moon. Traditional block adjustment based on rigorous sensor model is complicated due to a large number of parameters and possible correlations among them. To tackle this problem, this paper presents a block adjustment method using Rational Function Model (RFM). The RFM parameters are generated based on rigorous sensor model using virtual grid of control points. Afterwards, the RFM based block adjustment solves the refinement parameters through a least squares solution. Experimental results using CE-1 images located in Sinus Irdium show that the RFM can fit the rigorous sensor model with a high precision of 1% pixel level. Through the RFM-based block adjustment, the back-projection residuals in image space can be reduced from around 1.5 pixels to sub-pixel., indicating that RFM can replace rigorous sensor model for geometric processing of lunar images.
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios. PMID:26998784
Richardson, David B.; Laurier, Dominique; Schubauer-Berigan, Mary K.; Tchetgen, Eric Tchetgen; Cole, Stephen R.
2014-01-01
Workers' smoking histories are not measured in many occupational cohort studies. Here we discuss the use of negative control outcomes to detect and adjust for confounding in analyses that lack information on smoking. We clarify the assumptions necessary to detect confounding by smoking and the additional assumptions necessary to indirectly adjust for such bias. We illustrate these methods using data from 2 studies of radiation and lung cancer: the Colorado Plateau cohort study (1950–2005) of underground uranium miners (in which smoking was measured) and a French cohort study (1950–2004) of nuclear industry workers (in which smoking was unmeasured). A cause-specific relative hazards model is proposed for estimation of indirectly adjusted associations. Among the miners, the proposed method suggests no confounding by smoking of the association between radon and lung cancer—a conclusion supported by adjustment for measured smoking. Among the nuclear workers, the proposed method suggests substantial confounding by smoking of the association between radiation and lung cancer. Indirect adjustment for confounding by smoking resulted in an 18% decrease in the adjusted estimated hazard ratio, yet this cannot be verified because smoking was unmeasured. Assumptions underlying this method are described, and a cause-specific proportional hazards model that allows easy implementation using standard software is presented. PMID:25245043
Attar-Schwartz, Shalhevet
2015-09-01
Warm and emotionally close relationships with parents and grandparents have been found in previous studies to be linked with better adolescent adjustment. The present study, informed by Family Systems Theory and Intergenerational Solidarity Theory, uses a moderated mediation model analyzing the contribution of the dynamics of these intergenerational relationships to adolescent adjustment. Specifically, it examines the mediating role of emotional closeness to the closest grandparent in the relationship between emotional closeness to a parent (the offspring of the closest grandparent) and adolescent adjustment difficulties. The model also examines the moderating role of emotional closeness to parents in the relationship between emotional closeness to grandparents and adjustment difficulties. The study was based on a sample of 1,405 Jewish Israeli secondary school students (ages 12-18) who completed a structured questionnaire. It was found that emotional closeness to the closest grandparent was more strongly associated with reduced adjustment difficulties among adolescents with higher levels of emotional closeness to their parents. In addition, adolescent adjustment and emotional closeness to parents was partially mediated by emotional closeness to grandparents. Examining the family conditions under which adolescents' relationships with grandparents is stronger and more beneficial for them can help elucidate variations in grandparent-grandchild ties and expand our understanding of the mechanisms that shape child outcomes. PMID:26237053
ERIC Educational Resources Information Center
Laird, Robert D.; Weems, Carl F.
2011-01-01
Research on informant discrepancies has increasingly utilized difference scores. This article demonstrates the statistical equivalence of regression models using difference scores (raw or standardized) and regression models using separate scores for each informant to show that interpretations should be consistent with both models. First,…
Hu, Qinghua; Zhang, Shiguang; Xie, Zongxia; Mi, Jusheng; Wan, Jie
2014-09-01
Support vector regression (SVR) techniques are aimed at discovering a linear or nonlinear structure hidden in sample data. Most existing regression techniques take the assumption that the error distribution is Gaussian. However, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution, Laplacian distribution, or other models. In these cases the current regression techniques are not optimal. According to the Bayesian approach, we derive a general loss function and develop a technique of the uniform model of ν-support vector regression for the general noise model (N-SVR). The Augmented Lagrange Multiplier method is introduced to solve N-SVR. Numerical experiments on artificial data sets, UCI data and short-term wind speed prediction are conducted. The results show the effectiveness of the proposed technique. PMID:24874183
Modeling Quality-Adjusted Life Expectancy Loss Resulting from Tobacco Use in the United States
ERIC Educational Resources Information Center
Kaplan, Robert M.; Anderson, John P.; Kaplan, Cameron M.
2007-01-01
Purpose: To describe the development of a model for estimating the effects of tobacco use upon Quality Adjusted Life Years (QALYs) and to estimate the impact of tobacco use on health outcomes for the United States (US) population using the model. Method: We obtained estimates of tobacco consumption from 6 years of the National Health Interview…
Evaluation of the Stress Adjustment and Adaptation Model among Families Reporting Economic Pressure
ERIC Educational Resources Information Center
Vandsburger, Etty; Biggerstaff, Marilyn A.
2004-01-01
This research evaluates the Stress Adjustment and Adaptation Model (double ABCX model) examining the effects resiliency resources on family functioning when families experience economic pressure. Families (N = 128) with incomes at or below the poverty line from a rural area of a southern state completed measures of perceived economic pressure,…
A Model of Divorce Adjustment for Use in Family Service Agencies.
ERIC Educational Resources Information Center
Faust, Ruth Griffith
1987-01-01
Presents a combined educationally and therapeutically oriented model of treatment to (1) control and lessen disruptive experiences associated with divorce; (2) enable individuals to improve their skill in coping with adjustment reactions to divorce; and (3) modify the pressures and response of single parenthood. Describes the model's four-session…
ERIC Educational Resources Information Center
Nettles, Saundra Murray; Caughy, Margaret O'Brien; O'Campo, Patricia J.
2008-01-01
Examining recent research on neighborhood influences on child development, this review focuses on social influences on school adjustment in the early elementary years. A model to guide community research and intervention is presented. The components of the model of integrated processes are neighborhoods and their effects on academic outcomes and…
Risk adjustment of Medicare capitation payments using the CMS-HCC model.
Pope, Gregory C; Kautter, John; Ellis, Randall P; Ash, Arlene S; Ayanian, John Z; Lezzoni, Lisa I; Ingber, Melvin J; Levy, Jesse M; Robst, John
2004-01-01
This article describes the CMS hierarchical condition categories (HCC) model implemented in 2004 to adjust Medicare capitation payments to private health care plans for the health expenditure risk of their enrollees. We explain the model's principles, elements, organization, calibration, and performance. Modifications to reduce plan data reporting burden and adaptations for disabled, institutionalized, newly enrolled, and secondary payer subpopulations are discussed. PMID:15493448
Community Influences on Adjustment in First Grade: An Examination of an Integrated Process Model
ERIC Educational Resources Information Center
Caughy, Margaret O'Brien; Nettles, Saundra M.; O'Campo, Patricia J.
2007-01-01
We examined the impact of neighborhood characteristics both directly and indirectly as mediated by parent coaching and the parent/child affective relationship on behavioral and school adjustment in a sample of urban dwelling first graders. We used structural equations modeling to assess model fit and estimate direct, indirect, and total effects of…
Statistical modelling for thoracic surgery using a nomogram based on logistic regression.
Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H
2016-08-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Evaluation of Land Use Regression Models for Nitrogen Dioxide and Benzene in Four US Cities
Mukerjee, Shaibal; Smith, Luther; Neas, Lucas; Norris, Gary
2012-01-01
Spatial analysis studies have included the application of land use regression models (LURs) for health and air quality assessments. Recent LUR studies have collected nitrogen dioxide (NO2) and volatile organic compounds (VOCs) using passive samplers at urban air monitoring networks in El Paso and Dallas, TX, Detroit, MI, and Cleveland, OH to assess spatial variability and source influences. LURs were successfully developed to estimate pollutant concentrations throughout the study areas. Comparisons of development and predictive capabilities of LURs from these four cities are presented to address this issue of uniform application of LURs across study areas. Traffic and other urban variables were important predictors in the LURs although city-specific influences (such as border crossings) were also important. In addition, transferability of variables or LURs from one city to another may be problematic due to intercity differences and data availability or comparability. Thus, developing common predictors in future LURs may be difficult. PMID:23226985
Chuah, Candy; Jones, Malcolm K; McManus, Donald P; Nawaratna, Sujeevi K; Burke, Melissa L; Owen, Helen C; Ramm, Grant A; Gobert, Geoffrey N
2016-04-01
For hepatic schistosomiasis the egg-induced granulomatous response and the development of extensive fibrosis are the main pathologies. We used a Schistosoma japonicum-infected mouse model to characterise the multi-cellular pathways associated with the recovery from hepatic fibrosis following clearance of the infection with the anti-schistosomal drug, praziquantel. In the recovering liver splenomegaly, granuloma density and liver fibrosis were all reduced. Inflammatory cell infiltration into the liver was evident, and the numbers of neutrophils, eosinophils and macrophages were significantly decreased. Transcriptomic analysis revealed the up-regulation of fatty acid metabolism genes and the identification of Peroxisome proliferator activated receptor alpha as the upstream regulator of liver recovery. The aryl hydrocarbon receptor signalling pathway which regulates xenobiotic metabolism was also differentially up-regulated. These findings provide a better understanding of the mechanisms associated with the regression of hepatic schistosomiasis. PMID:26812024
Statistical modelling for thoracic surgery using a nomogram based on logistic regression
Liu, Run-Zhong; Zhao, Ze-Rui
2016-01-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
Giacomo, Della Riccia; Stefania, Del Zotto
2013-12-15
Fumonisins are mycotoxins produced by Fusarium species that commonly live in maize. Whereas fungi damage plants, fumonisins cause disease both to cattle breedings and human beings. Law limits set fumonisins tolerable daily intake with respect to several maize based feed and food. Chemical techniques assure the most reliable and accurate measurements, but they are expensive and time consuming. A method based on Near Infrared spectroscopy and multivariate statistical regression is described as a simpler, cheaper and faster alternative. We apply Partial Least Squares with full cross validation. Two models are described, having high correlation of calibration (0.995, 0.998) and of validation (0.908, 0.909), respectively. Description of observed phenomenon is accurate and overfitting is avoided. Screening of contaminated maize with respect to European legal limit of 4 mg kg(-1) should be assured. PMID:23993617
Domain selection for the varying coefficient model via local polynomial regression
Kong, Dehan; Bondell, Howard; Wu, Yichao
2014-01-01
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided. Specifically, the estimators enjoy the oracle properties in the sense that they have the same bias and asymptotic variance as the local polynomial estimators as if the sparsity is known as a priori. The choice of appropriate bandwidth and computational algorithms are discussed. The proposed method is examined via simulations and a real data example. PMID:25506112
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA
Regression Splines in the Time-Dependent Coefficient Rates Model for Recurrent Event Data
Amorim, Leila D.; Cai, Jianwen; Zeng, Donglin; Barreto, Maurício L.
2009-01-01
SUMMARY Many epidemiologic studies involve the occurrence of recurrent events and much attention has been given for the development of modelling techniques that take into account the dependence structure of multiple event data. This paper presents a time-dependent coefficient rates model that incorporates regression splines in its estimation procedure. Such method would be appropriate in situations where the effect of an exposure or covariates changes over time in recurrent event data settings. The finite sample properties of the estimators are studied via simulation. Using data from a randomized community trial that was designed to evaluate the effect of vitamin A supplementation on recurrent diarrheal episodes in small children, we model the functional form of the treatment effect on the time to the occurrence of diarrhea. The results describe how this effect varies over time. In summary, we observed a major impact of the vitamin A supplementation on diarrhea after 2 months of the dosage, with the effect diminishing after the third dosage. The proposed method can be viewed as a flexible alternative to the marginal rates model with constant effect in situations where the effect of interest may vary over time. PMID:18696748
NASA Astrophysics Data System (ADS)
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates
Bradshaw, Patrick T.; Ibrahim, Joseph G.; Gammon, Marilie D.
2010-01-01
Missing covariate data is common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, so here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project (LIBCSP) follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete case analysis yielded markedly different results. PMID:20960582
An adaptive regression mixture model for fMRI cluster analysis.
Oikonomou, Vangelis P; Blekas, Konstantinos
2013-04-01
Functional magnetic resonance imaging (fMRI) has become one of the most important techniques for studying the human brain in action. A common problem in fMRI analysis is the detection of activated brain regions in response to an experimental task. In this work we propose a novel clustering approach for addressing this issue using an adaptive regression mixture model. The main contribution of our method is the employment of both spatial and sparse properties over the body of the mixture model. Thus, the clustering approach is converted into a maximum a posteriori estimation approach, where the expectation-maximization algorithm is applied for model training. Special care is also given to estimate the kernel scalar parameter per cluster of the design matrix by presenting a multi-kernel scheme. In addition an incremental training procedure is presented so as to make the approach independent on the initialization of the model parameters. The latter also allows us to introduce an efficient stopping criterion of the process for determining the optimum brain activation area. To assess the effectiveness of our method, we have conducted experiments with simulated and real fMRI data, where we have demonstrated its ability to produce improved performance and functional activation detection capabilities. PMID:23047865
NASA Astrophysics Data System (ADS)
Zhao, Hong; Li, Changjun; Li, Hongping; Lv, Kebo; Zhao, Qinghui
2016-06-01
The sea surface salinity (SSS) is a key parameter in monitoring ocean states. Observing SSS can promote the understanding of global water cycle. This paper provides a new approach for retrieving sea surface salinity from Soil Moisture and Ocean Salinity (SMOS) satellite data. Based on the principal component regression (PCR) model, SSS can also be retrieved from the brightness temperature data of SMOS L2 measurements and Auxiliary data. 26 pair matchup data is used in model validation for the South China Sea (in the area of 4°-25°N, 105°-125°E). The RMSE value of PCR model retrieved SSS reaches 0.37 psu (practical salinity units) and the RMSE of SMOS SSS1 is 1.65 psu when compared with in-situ SSS. The corresponding Argo daily salinity data during April to June 2013 is also used in our validation with RMSE value 0.46 psu compared to 1.82 psu for daily averaged SMOS L2 products. This indicates that the PCR model is valid and may provide us with a good approach for retrieving SSS from SMOS satellite data.
Optimization of end-members used in multiple linear regression geochemical mixing models
NASA Astrophysics Data System (ADS)
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Smith, Matthew I.; de Lusignan, Simon; Mullett, David; Correa, Ana; Tickner, Jermaine; Jones, Simon
2016-01-01
Introduction Falls are the leading cause of injury in older people. Reducing falls could reduce financial pressures on health services. We carried out this research to develop a falls risk model, using routine primary care and hospital data to identify those at risk of falls, and apply a cost analysis to enable commissioners of health services to identify those in whom savings can be made through referral to a falls prevention service. Methods Multilevel logistical regression was performed on routinely collected general practice and hospital data from 74751 over 65’s, to produce a risk model for falls. Validation measures were carried out. A cost-analysis was performed to identify at which level of risk it would be cost-effective to refer patients to a falls prevention service. 95% confidence intervals were calculated using a Monte Carlo Model (MCM), allowing us to adjust for uncertainty in the estimates of these variables. Results A risk model for falls was produced with an area under the curve of the receiver operating characteristics curve of 0.87. The risk cut-off with the highest combination of sensitivity and specificity was at p = 0.07 (sensitivity of 81% and specificity of 78%). The risk cut-off at which savings outweigh costs was p = 0.27 and the risk cut-off with the maximum savings was p = 0.53, which would result in referral of 1.8% and 0.45% of the over 65’s population respectively. Above a risk cut-off of p = 0.27, costs do not exceed savings. Conclusions This model is the best performing falls predictive tool developed to date; it has been developed on a large UK city population; can be readily run from routine data; and can be implemented in a way that optimises the use of health service resources. Commissioners of health services should use this model to flag and refer patients at risk to their falls service and save resources. PMID:27448280
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Faranda, Davide Dubrulle, Bérengère; Daviaud, François; Pons, Flavio Maria Emanuele; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe
2014-10-15
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Brakebill, JW; Wolock, DM; Terziotti, SE
2011-01-01
Abstract Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. PMID:22457575
NASA Astrophysics Data System (ADS)
Snedden, Gregg A.; Steyer, Gregory D.
2013-02-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Kernel-based logistic regression model for protein sequence without vectorialization.
Fong, Youyi; Datta, Saheli; Georgiev, Ivelin S; Kwong, Peter D; Tomaras, Georgia D
2015-07-01
Protein sequence data arise more and more often in vaccine and infectious disease research. These types of data are discrete, high-dimensional, and complex. We propose to study the impact of protein sequences on binary outcomes using a kernel-based logistic regression model, which models the effect of protein through a random effect whose variance-covariance matrix is mostly determined by a kernel function. We propose a novel, biologically motivated, profile hidden Markov model (HMM)-based mutual information (MI) kernel. Hypothesis testing can be carried out using the maximum of the score statistics and a parametric bootstrap procedure. To improve the power of testing, we propose intuitive modifications to the test statistic. We show through simulation studies that the profile HMM-based MI kernel can be substantially more powerful than competing kernels, and that the modified test statistics bring incremental gains in power. We use these proposed methods to investigate two problems from HIV-1 vaccine research: (1) identifying segments of HIV-1 envelope (Env) protein that confer resistance to neutralizing antibody and (2) identifying segments of Env that are associated with attenuation of protective vaccine effect by antibodies of isotype A in the RV144 vaccine trial. PMID:25532524
Brakebill, Jw; Wolock, Dm; Terziotti, Se
2011-10-01
Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. PMID:22457575
NASA Astrophysics Data System (ADS)
Winahju, W. S.; Mukarromah, A.; Putri, S.
2015-03-01
Leprosy is a chronic infectious disease caused by bacteria of leprosy (Mycobacterium leprae). Leprosy has become an important thing in Indonesia because its morbidity is quite high. Based on WHO data in 2014, in 2012 Indonesia has the highest number of new leprosy patients after India and Brazil with a contribution of 18.994 people (8.7% of the world). This number makes Indonesia automatically placed as the country with the highest number of leprosy morbidity of ASEAN countries. The province that most contributes to the number of leprosy patients in Indonesia is East Java. There are two kind of leprosy. They consist of pausibacillary and multibacillary. The morbidity of multibacillary leprosy is higher than pausibacillary leprosy. This paper will discuss modeling both of the number of multibacillary and pausibacillary leprosy patients as responses variables. These responses are count variables, so modeling will be conducted by using bivariate poisson regression method. Unit experiment used is in East Java, and predictors involved are: environment, demography, and poverty. The model uses data in 2012, and the result indicates that all predictors influence significantly.