Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, Anne B.; Lizarraga, Joy S.
1996-01-01
Statistical operations termed model-adjustment procedures can be used to incorporate local data into existing regression modes to improve the predication of urban-runoff quality. Each procedure is a form of regression analysis in which the local data base is used as a calibration data set; the resulting adjusted regression models can then be used to predict storm-runoff quality at unmonitored sites. Statistical tests of the calibration data set guide selection among proposed procedures.
ERIC Educational Resources Information Center
Olejnik, Stephen; Mills, Jamie; Keselman, Harvey
2000-01-01
Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…
Lidauer, M H; Emmerling, R; Mäntysaari, E A
2008-06-01
A multiplicative random regression (M-RRM) test-day (TD) model was used to analyse daily milk yields from all available parities of German and Austrian Simmental dairy cattle. The method to account for heterogeneous variance (HV) was based on the multiplicative mixed model approach of Meuwissen. The variance model for the heterogeneity parameters included a fixed region x year x month x parity effect and a random herd x test-month effect with a within-herd first-order autocorrelation between test-months. Acceleration of variance model solutions after each multiplicative model cycle enabled fast convergence of adjustment factors and reduced total computing time significantly. Maximum Likelihood estimation of within-strata residual variances was enhanced by inclusion of approximated information on loss in degrees of freedom due to estimation of location parameters. This improved heterogeneity estimates for very small herds. The multiplicative model was compared with a model that assumed homogeneous variance. Re-estimated genetic variances, based on Mendelian sampling deviations, were homogeneous for the M-RRM TD model but heterogeneous for the homogeneous random regression TD model. Accounting for HV had large effect on cow ranking but moderate effect on bull ranking.
Barks, C.S.
1995-01-01
Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of
Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar
2016-08-15
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26892025
Hoos, Anne B.; Patel, Anant R.
1996-01-01
Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Kjelstrom, L.C.
1995-01-01
Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Estimation of adjusted rate differences using additive negative binomial regression.
Donoghoe, Mark W; Marschner, Ian C
2016-08-15
Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects
ERIC Educational Resources Information Center
Rocconi, Louis M.; Ethington, Corinna A.
2009-01-01
Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Adjustment of regional regression equations for urban storm-runoff quality using at-site data
Barks, C.S.
1996-01-01
Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.
Building Regression Models: The Importance of Graphics.
ERIC Educational Resources Information Center
Dunn, Richard
1989-01-01
Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Computing measures of explained variation for logistic regression models.
Mittlböck, M; Schemper, M
1999-01-01
The proportion of explained variation (R2) is frequently used in the general linear model but in logistic regression no standard definition of R2 exists. We present a SAS macro which calculates two R2-measures based on Pearson and on deviance residuals for logistic regression. Also, adjusted versions for both measures are given, which should prevent the inflation of R2 in small samples. PMID:10195643
Process modeling with the regression network.
van der Walt, T; Barnard, E; van Deventer, J
1995-01-01
A new connectionist network topology called the regression network is proposed. The structural and underlying mathematical features of the regression network are investigated. Emphasis is placed on the intricacies of the optimization process for the regression network and some measures to alleviate these difficulties of optimization are proposed and investigated. The ability of the regression network algorithm to perform either nonparametric or parametric optimization, as well as a combination of both, is also highlighted. It is further shown how the regression network can be used to model systems which are poorly understood on the basis of sparse data. A semi-empirical regression network model is developed for a metallurgical processing operation (a hydrocyclone classifier) by building mechanistic knowledge into the connectionist structure of the regression network model. Poorly understood aspects of the process are provided for by use of nonparametric regions within the structure of the semi-empirical connectionist model. The performance of the regression network model is compared to the corresponding generalization performance results obtained by some other nonparametric regression techniques.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
NASA Astrophysics Data System (ADS)
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
An empirical evaluation of spatial regression models
NASA Astrophysics Data System (ADS)
Gao, Xiaolu; Asami, Yasushi; Chung, Chang-Jo F.
2006-10-01
Conventional statistical methods are often ineffective to evaluate spatial regression models. One reason is that spatial regression models usually have more parameters or smaller sample sizes than a simple model, so their degree of freedom is reduced. Thus, it is often unlikely to evaluate them based on traditional tests. Another reason, which is theoretically associated with statistical methods, is that statistical criteria are crucially dependent on such assumptions as normality, independence, and homogeneity. This may create problems because the assumptions are open for testing. In view of these problems, this paper proposes an alternative empirical evaluation method. To illustrate the idea, a few hedonic regression models for a house and land price data set are evaluated, including a simple, ordinary linear regression model and three spatial models. Their performance as to how well the price of the house and land can be predicted is examined. With a cross-validation technique, the prices at each sample point are predicted with a model estimated with the samples excluding the one being concerned. Then, empirical criteria are established whereby the predicted prices are compared with the real, observed prices. The proposed method provides an objective guidance for the selection of a suitable model specification for a data set. Moreover, the method is seen as an alternative way to test the significance of the spatial relationships being concerned in spatial regression models.
A Spline Regression Model for Latent Variables
ERIC Educational Resources Information Center
Harring, Jeffrey R.
2014-01-01
Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…
Quality Reporting of Multivariable Regression Models in Observational Studies
Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M.
2016-01-01
Abstract Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE. Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model. The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0–30.3) of the articles and 18.5% (95% CI: 14.8–22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor. A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature. PMID:27196467
Bootstrap inference longitudinal semiparametric regression model
NASA Astrophysics Data System (ADS)
Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman
2016-02-01
Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.
VanderWeele, Tyler J; Robinson, Whitney R
2014-07-01
We consider several possible interpretations of the "effect of race" when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person's life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the effect of race (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (eg, skin color), parental physical phenotype, genetic background, and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects.
On causal interpretation of race in regressions adjusting for confounding and mediating variables
VanderWeele, Tyler J.; Robinson, Whitney R.
2014-01-01
We consider several possible interpretations of the “effect of race” when regressions are run with race as an exposure variable, controlling also for various confounding and mediating variables. When adjustment is made for socioeconomic status early in a person’s life, we discuss under what contexts the regression coefficients for race can be interpreted as corresponding to the extent to which a racial inequality would remain if various socioeconomic distributions early in life across racial groups could be equalized. When adjustment is also made for adult socioeconomic status, we note how the overall racial inequality can be decomposed into the portion that would be eliminated by equalizing adult socioeconomic status across racial groups and the portion of the inequality that would remain even if adult socioeconomic status across racial groups were equalized. We also discuss a stronger interpretation of the “effect of race” (stronger in terms of assumptions) involving the joint effects of race-associated physical phenotype (e.g. skin color), parental physical phenotype, genetic background and cultural context when such variables are thought to be hypothetically manipulable and if adequate control for confounding were possible. We discuss some of the challenges with such an interpretation. Further discussion is given as to how the use of selected populations in examining racial disparities can additionally complicate the interpretation of the effects. PMID:24887159
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Modeling confounding by half-sibling regression.
Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-07-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application.
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.
General Regression and Representation Model for Classification
Qian, Jianjun; Yang, Jian; Xu, Yong
2014-01-01
Recently, the regularized coding-based classification methods (e.g. SRC and CRC) show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR) for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients) and the specific information (weight matrix of image pixels) to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel) weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR) and robust general regression and representation classifier (R-GRR). The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms. PMID:25531882
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context.
Lopez, Michael J; Gutman, Roee
2014-11-28
Propensity score methods are common for estimating a binary treatment effect when treatment assignment is not randomized. When exposure is measured on an ordinal scale (i.e. low-medium-high), however, propensity score inference requires extensions which have received limited attention. Estimands of possible interest with an ordinal exposure are the average treatment effects between each pair of exposure levels. Using these estimands, it is possible to determine an optimal exposure level. Traditional methods, including dichotomization of the exposure or a series of binary propensity score comparisons across exposure pairs, are generally inadequate for identification of optimal levels. We combine subclassification with regression adjustment to estimate transitive, unbiased average causal effects across an ordered exposure, and apply our method on the 2005-2006 National Health and Nutrition Examination Survey to estimate the effects of nutritional label use on body mass index.
Regression Models For Saffron Yields in Iran
NASA Astrophysics Data System (ADS)
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.
Quantile regression modeling for Malaysian automobile insurance premium data
NASA Astrophysics Data System (ADS)
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Data correction for seven activity trackers based on regression models.
Andalibi, Vafa; Honko, Harri; Christophe, Francois; Viik, Jari
2015-08-01
Using an activity tracker for measuring activity-related parameters, e.g. steps and energy expenditure (EE), can be very helpful in assisting a person's fitness improvement. Unlike the measuring of number of steps, an accurate EE estimation requires additional personal information as well as accurate velocity of movement, which is hard to achieve due to inaccuracy of sensors. In this paper, we have evaluated regression-based models to improve the precision for both steps and EE estimation. For this purpose, data of seven activity trackers and two reference devices was collected from 20 young adult volunteers wearing all devices at once in three different tests, namely 60-minute office work, 6-hour overall activity and 60-minute walking. Reference data is used to create regression models for each device and relative percentage errors of adjusted values are then statistically compared to that of original values. The effectiveness of regression models are determined based on the result of a statistical test. During a walking period, EE measurement was improved in all devices. The step measurement was also improved in five of them. The results show that improvement of EE estimation is possible only with low-cost implementation of fitting model over the collected data e.g. in the app or in corresponding service back-end. PMID:26736578
Flexible regression models over river networks
O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark
2014-01-01
Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment.
Ho Hoang, Khai-Long; Mombaur, Katja
2015-10-15
Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment. PMID:26338096
Reconstruction of missing daily streamflow data using dynamic regression models
NASA Astrophysics Data System (ADS)
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
Analyzing Historical Count Data: Poisson and Negative Binomial Regression Models.
ERIC Educational Resources Information Center
Beck, E. M.; Tolnay, Stewart E.
1995-01-01
Asserts that traditional approaches to multivariate analysis, including standard linear regression techniques, ignore the special character of count data. Explicates three suitable alternatives to standard regression techniques, a simple Poisson regression, a modified Poisson regression, and a negative binomial model. (MJP)
Three-Dimensional Modeling in Linear Regression.
ERIC Educational Resources Information Center
Herman, James D.
Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.
Testing Different Model Building Procedures Using Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Modeling maximum daily temperature using a varying coefficient regression model
NASA Astrophysics Data System (ADS)
Li, Han; Deng, Xinwei; Kim, Dong-Yun; Smith, Eric P.
2014-04-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature. A good predictive model for daily maximum temperature is required because daily maximum temperature is an important measure for predicting survival of temperature sensitive fish. To appropriately model the strong relationship between water and air temperatures at a daily time step, it is important to incorporate information related to the time of the year into the modeling. In this work, a time-varying coefficient model is used to study the relationship between air temperature and water temperature. The time-varying coefficient model enables dynamic modeling of the relationship, and can be used to understand how the air-water temperature relationship varies over time. The proposed model is applied to 10 streams in Maryland, West Virginia, Virginia, North Carolina, and Georgia using daily maximum temperatures. It provides a better fit and better predictions than those produced by a simple linear regression model or a nonlinear logistic model.
Tolerance bounds for log gamma regression models
NASA Technical Reports Server (NTRS)
Jones, R. A.; Scholz, F. W.; Ossiander, M.; Shorack, G. R.
1985-01-01
The present procedure for finding lower confidence bounds for the quantiles of Weibull populations, on the basis of the solution of a quadratic equation, is more accurate than current Monte Carlo tables and extends to any location-scale family. It is shown that this method is accurate for all members of the log gamma(K) family, where K = 1/2 to infinity, and works well for censored data, while also extending to regression data. An even more accurate procedure involving an approximation to the Lawless (1982) conditional procedure, with numerical integrations whose tables are independent of the data, is also presented. These methods are applied to the case of failure strengths of ceramic specimens from each of three billets of Si3N4, which have undergone flexural strength testing.
Yang, Xue; Lauzon, Carolyn B.; Crainiceanu, Ciprian; Caffo, Brian; Resnick, Susan M.; Landman, Bennett A.
2012-01-01
Massively univariate regression and inference in the form of statistical parametric mapping have transformed the way in which multi-dimensional imaging data are studied. In functional and structural neuroimaging, the de facto standard “design matrix”-based general linear regression model and its multi-level cousins have enabled investigation of the biological basis of the human brain. With modern study designs, it is possible to acquire multi-modal three-dimensional assessments of the same individuals — e.g., structural, functional and quantitative magnetic resonance imaging, alongside functional and ligand binding maps with positron emission tomography. Largely, current statistical methods in the imaging community assume that the regressors are non-random. For more realistic multi-parametric assessment (e.g., voxel-wise modeling), distributional consideration of all observations is appropriate. Herein, we discuss two unified regression and inference approaches, model II regression and regression calibration, for use in massively univariate inference with imaging data. These methods use the design matrix paradigm and account for both random and non-random imaging regressors. We characterize these methods in simulation and illustrate their use on an empirical dataset. Both methods have been made readily available as a toolbox plug-in for the SPM software. PMID:22609453
An Importance Sampling EM Algorithm for Latent Regression Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2007-01-01
Reporting methods used in large-scale assessments such as the National Assessment of Educational Progress (NAEP) rely on latent regression models. To fit the latent regression model using the maximum likelihood estimation technique, multivariate integrals must be evaluated. In the computer program MGROUP used by the Educational Testing Service for…
Relative risk regression models with inverse polynomials.
Ning, Yang; Woodward, Mark
2013-08-30
The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26934999
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Detecting Heterogeneity in Logistic Regression Models
ERIC Educational Resources Information Center
Balazs, Katalin; Hidegkuti, Istvan; De Boeck, Paul
2006-01-01
In the context of item response theory, it is not uncommon that person-by-item data are correlated beyond the correlation that is captured by the model--in other words, there is extra binomial variation. Heterogeneity of the parameters can explain this variation. There is a need for proper statistical methods to indicate possible extra…
[Application of tobit regression models in modelling censored epidemiological variables].
Bleda Hernández, M J; Tobías Garcés, A
2002-01-01
Many variables in epidemiological studies are continuous measures obtained by means of measurement equipments with detection limits, generating censored distributions. The censorship, opposite to the trucation, takes place for a defect of the data of the sample. The distribution of a censored variable is a mixture between a continuous and a categorical distributions. In this case, results from lineal regression models, by means of ordinary least squares, will provide biased estimates. With one only censorhip point the tobit model must be used, while with several censorship points this model's generalization should also be used. The illustration of these models is presented through the analysis of the levels of mercury measured in urine in the study about health effects of a municipal solid-waste incinerator in the county of Mataró (Spain).
Stratton, Kelly G; Cook, Andrea J; Jackson, Lisa A; Nelson, Jennifer C
2015-03-30
Sequential methods are well established for randomized clinical trials (RCTs), and their use in observational settings has increased with the development of national vaccine and drug safety surveillance systems that monitor large healthcare databases. Observational safety monitoring requires that sequential testing methods be better equipped to incorporate confounder adjustment and accommodate rare adverse events. New methods designed specifically for observational surveillance include a group sequential likelihood ratio test that uses exposure matching and generalized estimating equations approach that involves regression adjustment. However, little is known about the statistical performance of these methods or how they compare to RCT methods in both observational and rare outcome settings. We conducted a simulation study to determine the type I error, power and time-to-surveillance-end of group sequential likelihood ratio test, generalized estimating equations and RCT methods that construct group sequential Lan-DeMets boundaries using data from a matched (group sequential Lan-DeMets-matching) or unmatched regression (group sequential Lan-DeMets-regression) setting. We also compared the methods using data from a multisite vaccine safety study. All methods had acceptable type I error, but regression methods were more powerful, faster at detecting true safety signals and less prone to implementation difficulties with rare events than exposure matching methods. Method performance also depended on the distribution of information and extent of confounding by site. Our results suggest that choice of sequential method, especially the confounder control strategy, is critical in rare event observational settings. These findings provide guidance for choosing methods in this context and, in particular, suggest caution when conducting exposure matching.
Many regression algorithms, one unified model: A review.
Stulp, Freek; Sigaud, Olivier
2015-09-01
Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression.
Methods for Adjusting U.S. Geological Survey Rural Regression Peak Discharges in an Urban Setting
Moglen, Glenn E.; Shivers, Dorianne E.
2006-01-01
A study was conducted of 78 U.S. Geological Survey gaged streams that have been subjected to varying degrees of urbanization over the last three decades. Flood-frequency analysis coupled with nonlinear regression techniques were used to generate a set of equations for converting peak discharge estimates determined from rural regression equations to a set of peak discharge estimates that represent known urbanization. Specifically, urban regression equations for the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year return periods were calibrated as a function of the corresponding rural peak discharge and the percentage of impervious area in a watershed. The results of this study indicate that two sets of equations, one set based on imperviousness and one set based on population density, performed well. Both sets of equations are dependent on rural peak discharges, a measure of development (average percentage of imperviousness or average population density), and a measure of homogeneity of development within a watershed. Average imperviousness was readily determined by using geographic information system methods and commonly available land-cover data. Similarly, average population density was easily determined from census data. Thus, a key advantage to the equations developed in this study is that they do not require field measurements of watershed characteristics as did the U.S. Geological Survey urban equations developed in an earlier investigation. During this study, the U.S. Geological Survey PeakFQ program was used as an integral tool in the calibration of all equations. The scarcity of historical land-use data, however, made exclusive use of flow records necessary for the 30-year period from 1970 to 2000. Such relatively short-duration streamflow time series required a nonstandard treatment of the historical data function of the PeakFQ program in comparison to published guidelines. Thus, the approach used during this investigation does not fully comply with the
Matrix variate logistic regression model with application to EEG data.
Hung, Hung; Wang, Chen-Chien
2013-01-01
Logistic regression has been widely applied in the field of biomedical research for a long time. In some applications, the covariates of interest have a natural structure, such as that of a matrix, at the time of collection. The rows and columns of the covariate matrix then have certain physical meanings, and they must contain useful information regarding the response. If we simply stack the covariate matrix as a vector and fit a conventional logistic regression model, relevant information can be lost, and the problem of inefficiency will arise. Motivated from these reasons, we propose in this paper the matrix variate logistic (MV-logistic) regression model. The advantages of the MV-logistic regression model include the preservation of the inherent matrix structure of covariates and the parsimony of parameters needed. In the EEG Database Data Set, we successfully extract the structural effects of covariate matrix, and a high classification accuracy is achieved.
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Joint regression analysis and AMMI model applied to oat improvement
NASA Astrophysics Data System (ADS)
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Penalized spline estimation for functional coefficient regression models
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z.
2011-01-01
The functional coefficient regression models assume that the regression coefficients vary with some “threshold” variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called “curse of dimensionality” in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application. PMID:21516260
copCAR: A Flexible Regression Model for Areal Data
Hughes, John
2014-01-01
Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM’s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online. PMID:26539023
Default Bayes Factors for Model Selection in Regression
ERIC Educational Resources Information Center
Rouder, Jeffrey N.; Morey, Richard D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Evaluation of land use regression models in Detroit, Michigan
Introduction: Land use regression (LUR) models have emerged as a cost-effective tool for characterizing exposure in epidemiologic health studies. However, little critical attention has been focused on validation of these models as a step toward temporal and spatial extension of ...
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Calibration of stormwater quality regression models: a random process?
Dembélé, A; Bertrand-Krajewski, J-L; Barillon, B
2010-01-01
Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, R.M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Spatial stochastic regression modelling of urban land use
NASA Astrophysics Data System (ADS)
Arshad, S. H. M.; Jaafar, J.; Abiden, M. Z. Z.; Latif, Z. A.; Rasam, A. R. A.
2014-02-01
Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable.
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and
Bayesian residual analysis for beta-binomial regression models
NASA Astrophysics Data System (ADS)
Pires, Rubiane Maria; Diniz, Carlos Alberto Ribeiro
2012-10-01
The beta-binomial regression model is an alternative model to the sum of any sequence of equicorrelated binary variables with common probability of success p. In this work a Bayesian perspective of this model is presented considering different link functions and different correlation structures. A general Bayesian residual analysis for this model, a issue which is often neglected in Bayesian analysis, using the residuals based on the predicted values obtained by the conditional predictive ordinate [1], the residuals based on the posterior distribution of the model parameters [2] and the Bayesian deviance residual [3] are presented in order to check the assumptions in the model.
REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON
This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...
Modeling energy expenditure in children and adolescents using quantile regression
Technology Transfer Automated Retrieval System (TEKTRAN)
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...
A regression model to estimate regional ground water recharge
Lorenz, D.L.; Delin, G.N.
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
A regressive model of isochronism in speech units
NASA Astrophysics Data System (ADS)
Jassem, W.; Krzysko, M.; Stolarski, P.
1981-09-01
To define linguistic isochronism in quantitative terms, a statistical regressive method of analyzing the number of rhythmic units in human speech was employed. The material used was two taped texts spoken in standard British English totaling approximately 2,500 sounds. The sounds were divided into statistically homogeneous classes, and the mean values in each class were utilized in regressive models. Abercrombie's theory of speech rhythm postulating anacrusis and Jassem's theory postulating two types of speech units, anacrusis and a rhythmic unit in the strict sense, were tested using this material.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.
A Study of Perfectionism, Attachment, and College Student Adjustment: Testing Mediational Models.
ERIC Educational Resources Information Center
Hood, Camille A.; Kubal, Anne E.; Pfaller, Joan; Rice, Kenneth G.
Mediational models predicting college students' adjustment were tested using regression analyses. Contemporary adult attachment theory was employed to explore the cognitive/affective mechanisms by which adult attachment and perfectionism affect various aspects of psychological functioning. Consistent with theoretical expectations, results…
Flexible regression models for rate differences, risk differences and relative risks.
Donoghoe, Mark W; Marschner, Ian C
2015-05-01
Generalized additive models (GAMs) based on the binomial and Poisson distributions can be used to provide flexible semi-parametric modelling of binary and count outcomes. When used with the canonical link function, these GAMs provide semi-parametrically adjusted odds ratios and rate ratios. For adjustment of other effect measures, including rate differences, risk differences and relative risks, non-canonical link functions must be used together with a constrained parameter space. However, the algorithms used to fit these models typically rely on a form of the iteratively reweighted least squares algorithm, which can be numerically unstable when a constrained non-canonical model is used. We describe an application of a combinatorial EM algorithm to fit identity link Poisson, identity link binomial and log link binomial GAMs in order to estimate semi-parametrically adjusted rate differences, risk differences and relative risks. Using smooth regression functions based on B-splines, the method provides stable convergence to the maximum likelihood estimates, and it ensures that the estimates always remain within the parameter space. It is also straightforward to apply a monotonicity constraint to the smooth regression functions. We illustrate the method using data from a clinical trial in heart attack patients. PMID:25781711
Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.
Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko
2010-01-01
For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.
Forecasting relativistic electron flux using dynamic multiple regression models
NASA Astrophysics Data System (ADS)
Wei, H.-L.; Billings, S. A.; Surjalal Sharma, A.; Wing, S.; Boynton, R. J.; Walker, S. N.
2011-02-01
The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.
Quasi-likelihood estimation for relative risk regression models.
Carter, Rickey E; Lipsitz, Stuart R; Tilley, Barbara C
2005-01-01
For a prospective randomized clinical trial with two groups, the relative risk can be used as a measure of treatment effect and is directly interpretable as the ratio of success probabilities in the new treatment group versus the placebo group. For a prospective study with many covariates and a binary outcome (success or failure), relative risk regression may be of interest. If we model the log of the success probability as a linear function of covariates, the regression coefficients are log-relative risks. However, using such a log-linear model with a Bernoulli likelihood can lead to convergence problems in the Newton-Raphson algorithm. This is likely to occur when the success probabilities are close to one. A constrained likelihood method proposed by Wacholder (1986, American Journal of Epidemiology 123, 174-184), also has convergence problems. We propose a quasi-likelihood method of moments technique in which we naively assume the Bernoulli outcome is Poisson, with the mean (success probability) following a log-linear model. We use the Poisson maximum likelihood equations to estimate the regression coefficients without constraints. Using method of moment ideas, one can show that the estimates using the Poisson likelihood will be consistent and asymptotically normal. We apply these methods to a double-blinded randomized trial in primary biliary cirrhosis of the liver (Markus et al., 1989, New England Journal of Medicine 320, 1709-1713). PMID:15618526
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Onjia, Antonije E.
2014-01-01
Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method. PMID:24672394
Patton, Allison P; Zamore, Wig; Naumova, Elena N; Levy, Jonathan I; Brugge, Doug; Durant, John L
2015-05-19
Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34-46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R(2) = 0.24-0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R(2) < 0.17). Locally-calibrated transferred models (R(2) = 0.19-0.40) and the Boston-area model (adjusted-R(2) = 0.26, range: 0.13-0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data.
2015-01-01
Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34–46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R2 = 0.24–0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R2 < 0.17). Locally-calibrated transferred models (R2 = 0.19–0.40) and the Boston-area model (adjusted-R2 = 0.26, range: 0.13–0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data. PMID:25867675
A Product Partition Model With Regression on Covariates
Müller, Peter; Quintana, Fernando; Rosner, Gary L.
2011-01-01
We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient’s covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction. We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates. An implementation of the proposed model as R-package is available for download. PMID:21566678
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031
Aulenbach, Brent T.
2013-01-01
A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.
Bailit, Jennifer L.; Grobman, William A.; Rice, Madeline Murguia; Spong, Catherine Y.; Wapner, Ronald J.; Varner, Michael W.; Thorp, John M.; Leveno, Kenneth J.; Caritis, Steve N.; Shubert, Phillip J.; Tita, Alan T. N.; Saade, George; Sorokin, Yoram; Rouse, Dwight J.; Blackwell, Sean C.; Tolosa, Jorge E.; Van Dorsten, J. Peter
2014-01-01
Objective Regulatory bodies and insurers evaluate hospital quality using obstetrical outcomes, however meaningful comparisons should take pre-existing patient characteristics into account. Furthermore, if risk-adjusted outcomes are consistent within a hospital, fewer measures and resources would be needed to assess obstetrical quality. Our objective was to establish risk-adjusted models for five obstetric outcomes and assess hospital performance across these outcomes. Study Design A cohort study of 115,502 women and their neonates born in 25 hospitals in the United States between March 2008 and February 2011. Hospitals were ranked according to their unadjusted and risk-adjusted frequency of venous thromboembolism, postpartum hemorrhage, peripartum infection, severe perineal laceration, and a composite neonatal adverse outcome. Correlations between hospital risk-adjusted outcome frequencies were assessed. Results Venous thromboembolism occurred too infrequently (0.03%, 95% CI 0.02% – 0.04%) for meaningful assessment. Other outcomes occurred frequently enough for assessment (postpartum hemorrhage 2.29% (95% CI 2.20–2.38), peripartum infection 5.06% (95% CI 4.93–5.19), severe perineal laceration at spontaneous vaginal delivery 2.16% (95% CI 2.06–2.27), neonatal composite 2.73% (95% CI 2.63–2.84)). Although there was high concordance between unadjusted and adjusted hospital rankings, several individual hospitals had an adjusted rank that was substantially different (as much as 12 rank tiers) than their unadjusted rank. None of the correlations between hospital adjusted outcome frequencies was significant. For example, the hospital with the lowest adjusted frequency of peripartum infection had the highest adjusted frequency of severe perineal laceration. Conclusions Evaluations based on a single risk-adjusted outcome cannot be generalized to overall hospital obstetric performance. PMID:23891630
Augmented Beta rectangular regression models: A Bayesian perspective.
Wang, Jue; Luo, Sheng
2016-01-01
Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. PMID:26289406
Development and Application of Nonlinear Land-Use Regression Models
NASA Astrophysics Data System (ADS)
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted
NASA Astrophysics Data System (ADS)
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
NASA Astrophysics Data System (ADS)
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2015-12-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
2011-01-01
Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG) signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2) values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS) was shown to have high isometric torque estimation accuracy combined with very short training times. PMID:21943179
NASA Astrophysics Data System (ADS)
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
Ordinal logistic regression models: application in quality of life studies.
Abreu, Mery Natali Silva; Siqueira, Arminda Lucia; Cardoso, Clareci Silva; Caiaffa, Waleska Teixeira
2008-01-01
Quality of life has been increasingly emphasized in public health research in recent years. Typically, the results of quality of life are measured by means of ordinal scales. In these situations, specific statistical methods are necessary because procedures such as either dichotomization or misinformation on the distribution of the outcome variable may complicate the inferential process. Ordinal logistic regression models are appropriate in many of these situations. This article presents a review of the proportional odds model, partial proportional odds model, continuation ratio model, and stereotype model. The fit, statistical inference, and comparisons between models are illustrated with data from a study on quality of life in 273 patients with schizophrenia. All tested models showed good fit, but the proportional odds or partial proportional odds models proved to be the best choice due to the nature of the data and ease of interpretation of the results. Ordinal logistic models perform differently depending on categorization of outcome, adequacy in relation to assumptions, goodness-of-fit, and parsimony.
Modeling pan evaporation for Kuwait by multiple linear regression.
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
NASA Astrophysics Data System (ADS)
Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou
2013-05-01
A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is
Evaluating Geographically Weighted Regression Models for Environmental Chemical Risk Analysis
Czarnota, Jenna; Wheeler, David C; Gennings, Chris
2015-01-01
In the evaluation of cancer risk related to environmental chemical exposures, the effect of many correlated chemicals on disease is often of interest. The relationship between correlated environmental chemicals and health effects is not always constant across a study area, as exposure levels may change spatially due to various environmental factors. Geographically weighted regression (GWR) has been proposed to model spatially varying effects. However, concerns about collinearity effects, including regression coefficient sign reversal (ie, reversal paradox), may limit the applicability of GWR for environmental chemical risk analysis. A penalized version of GWR, the geographically weighted lasso, has been proposed to remediate the collinearity effects in GWR models. Our focus in this study was on assessing through a simulation study the ability of GWR and GWL to correctly identify spatially varying chemical effects for a mixture of correlated chemicals within a study area. Our results showed that GWR suffered from the reversal paradox, while GWL overpenalized the effects for the chemical most strongly related to the outcome. PMID:25983546
Forecasting volatility with neural regression: a contribution to model adequacy.
Refenes, A N; Holt, W T
2001-01-01
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
A new form of bivariate generalized Poisson regression model
NASA Astrophysics Data System (ADS)
Faroughi, Pouya; Ismail, Noriszura
2014-09-01
This paper introduces a new form of bivariate generalized Poisson (BGP) regression which can be fitted to bivariate and correlated count data with covariates. The BGP regression suggested in this study can be fitted not only to bivariate count data with positive, zero or negative correlations, but also to underdispersed or overdispersed bivariate count data. Applications of bivariate Poisson (BP) regression and the new BGP regression are illustrated on Malaysian motor insurance data.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.
Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-04-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.
Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-04-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. PMID:27104857
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
NASA Astrophysics Data System (ADS)
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
Regression models of sprint, vertical jump, and change of direction performance.
Swinton, Paul A; Lloyd, Ray; Keogh, Justin W L; Agouris, Ioannis; Stewart, Arthur D
2014-07-01
It was the aim of the present study to expand on previous correlation analyses that have attempted to identify factors that influence performance of jumping, sprinting, and changing direction. This was achieved by using a regression approach to obtain linear models that combined anthropometric, strength, and other biomechanical variables. Thirty rugby union players participated in the study (age: 24.2 ± 3.9 years; stature: 181.2 ± 6.6 cm; mass: 94.2 ± 11.1 kg). The athletes' ability to sprint, jump, and change direction was assessed using a 30-m sprint, vertical jump, and 505 agility test, respectively. Regression variables were collected during maximum strength tests (1 repetition maximum [1RM] deadlift and squat) and performance of fast velocity resistance exercises (deadlift and jump squat) using submaximum loads (10-70% 1RM). Force, velocity, power, and rate of force development (RFD) values were measured during fast velocity exercises with the greatest values produced across loads selected for further analysis. Anthropometric data, including lengths, widths, and girths were collected using a 3-dimensional body scanner. Potential regression variables were first identified using correlation analyses. Suitable variables were then regressed using a best subsets approach. Three factor models generally provided the most appropriate balance between explained variance and model complexity. Adjusted R values of 0.86, 0.82, and 0.67 were obtained for sprint, jump, and change of direction performance, respectively. Anthropometric measurements did not feature in any of the top models because of their strong association with body mass. For each performance measure, variance was best explained by relative maximum strength. Improvements in models were then obtained by including velocity and power values for jumping and sprinting performance, and by including RFD values for change of direction performance. PMID:24345969
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models. PMID:19540972
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed. PMID:25412761
Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.
Figura, Simon; Livingstone, David M; Kipfer, Rolf
2015-01-01
Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed.
Sheehan, Kenneth R.; Strager, Michael P.; Welsh, Stuart
2013-01-01
Stream habitat assessments are commonplace in fish management, and often involve nonspatial analysis methods for quantifying or predicting habitat, such as ordinary least squares regression (OLS). Spatial relationships, however, often exist among stream habitat variables. For example, water depth, water velocity, and benthic substrate sizes within streams are often spatially correlated and may exhibit spatial nonstationarity or inconsistency in geographic space. Thus, analysis methods should address spatial relationships within habitat datasets. In this study, OLS and a recently developed method, geographically weighted regression (GWR), were used to model benthic substrate from water depth and water velocity data at two stream sites within the Greater Yellowstone Ecosystem. For data collection, each site was represented by a grid of 0.1 m2 cells, where actual values of water depth, water velocity, and benthic substrate class were measured for each cell. Accuracies of regressed substrate class data by OLS and GWR methods were calculated by comparing maps, parameter estimates, and determination coefficient r 2. For analysis of data from both sites, Akaike’s Information Criterion corrected for sample size indicated the best approximating model for the data resulted from GWR and not from OLS. Adjusted r 2 values also supported GWR as a better approach than OLS for prediction of substrate. This study supports GWR (a spatial analysis approach) over nonspatial OLS methods for prediction of habitat for stream habitat assessments.
A Unified Approach to Power Calculation and Sample Size Determination for Random Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2007-01-01
The underlying statistical models for multiple regression analysis are typically attributed to two types of modeling: fixed and random. The procedures for calculating power and sample size under the fixed regression models are well known. However, the literature on random regression models is limited and has been confined to the case of all…
Karst aquifer dynamic modelling by evolutionary polynomial regression
NASA Astrophysics Data System (ADS)
Doglioni, Angelo; Giustolisi, Orazio; Simeone, Vincenzo
2010-05-01
Evolutionary Polynomial Regression (EPR) is an evolutionary modelling technique which has been successfully applied to multiple problems related to environmental engineering. In particular, it proved quite effective at modelling the dynamic relationship between groundwater levels and rainfall heights for a specific case study related to a porous aquifer. This paper introduces an application of EPR aimed at modelling the relationship between rainfall heights and groundwater tables of a karst aquifer. From a hydrogeological point of view, a karst aquifer is characterized by a quick response to rainfall due to the preferential paths through the ground. It has been monitored over the years thus producing a reasonably long dataset covering about 44 years. On the one hand, these data show some discontinuities, but on the other hand, they are available from a well located in a neighbourhood where there is almost no pumping as well as further disturbances related to human activities. The use of multiobjective EPR will allow finding a set of feasible symbolic models which helps to make a robust choice of models as well as to investigate about the structures of the models and how the aquifer response is influenced by rainfall. The authors makes also a comparison with the results they found for the porous aquifer, thus trying to assess which differences exist, from the physical point of view, between the two cases study and the capability of EPR at catching a quicker dynamics. Finally, it is noteworthy that the investigated aquifer is relatively geographically close to the already investigated one, about 40 km. This will also allow for investigating the effect of rainfall change, in terms of intensity variations, on differently structured aquifers whereas there is a similar climate regime.
Symbolic regression modeling of noise generation at porous airfoils
NASA Astrophysics Data System (ADS)
Sarradj, Ennes; Geyer, Thomas
2014-07-01
Based on data sets from previous experimental studies, the tool of symbolic regression is applied to find empirical models that describe the noise generation at porous airfoils. Both the self noise from the interaction of a turbulent boundary layer with the trailing edge of an porous airfoil and the noise generated at the leading edge due to turbulent inflow are considered. Following a dimensional analysis, models are built for trailing edge noise and leading edge noise in terms of four and six dimensionless quantities, respectively. Models of different accuracy and complexity are proposed and discussed. For the trailing edge noise case, a general dependency of the sound power on the fifth power of the flow velocity was found and the frequency spectrum is controlled by the flow resistivity of the porous material. Leading edge noise power is proportional to the square of the turbulence intensity and shows a dependency on the fifth to sixth power of the flow velocity, while the spectrum is governed by the flow resistivity and the integral length scale of the incoming turbulence.
Kernel Averaged Predictors for Spatio-Temporal Regression Models.
Heaton, Matthew J; Gelfand, Alan E
2012-12-01
In applications where covariates and responses are observed across space and time, a common goal is to quantify the effect of a change in the covariates on the response while adequately accounting for the spatio-temporal structure of the observations. The most common approach for building such a model is to confine the relationship between a covariate and response variable to a single spatio-temporal location. However, oftentimes the relationship between the response and predictors may extend across space and time. In other words, the response may be affected by levels of predictors in spatio-temporal proximity to the response location. Here, a flexible modeling framework is proposed to capture such spatial and temporal lagged effects between a predictor and a response. Specifically, kernel functions are used to weight a spatio-temporal covariate surface in a regression model for the response. The kernels are assumed to be parametric and non-stationary with the data informing the parameter values of the kernel. The methodology is illustrated on simulated data as well as a physical data set of ozone concentrations to be explained by temperature. PMID:24010051
NASA Astrophysics Data System (ADS)
Zamani, Hossein; Faroughi, Pouya; Ismail, Noriszura
2014-06-01
This study relates the Poisson, mixed Poisson (MP), generalized Poisson (GP) and finite Poisson mixture (FPM) regression models through mean-variance relationship, and suggests the application of these models for overdispersed count data. As an illustration, the regression models are fitted to the US skin care count data. The results indicate that FPM regression model is the best model since it provides the largest log likelihood and the smallest AIC, followed by Poisson-Inverse Gaussion (PIG), GP and negative binomial (NB) regression models. The results also show that NB, PIG and GP regression models provide similar results.
Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models
Hahn, Camerin; Noghanian, Sima
2012-01-01
As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007). However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed. PMID:22550473
Augmented mixed beta regression models for periodontal proportion data
Galvis, Diana M.; Bandyopadhyay, Dipankar; Lachos, Victor H.
2014-01-01
Continuous (clustered) proportion data often arise in various domains of medicine and public health where the response variable of interest is a proportion (or percentage) quantifying disease status for the cluster units, ranging between zero and one. However, because of the presence of relatively disease-free as well as heavily diseased subjects in any study, the proportion values can lie in the interval [0, 1]. While beta regression can be adapted to assess covariate effects in these situations, its versatility is often challenged because of the presence/excess of zeros and ones because the beta support lies in the interval (0, 1). To circumvent this, we augment the probabilities of zero and one with the beta density, controlling for the clustering effect. Our approach is Bayesian with the ability to borrow information across various stages of the complex model hierarchy and produces a computationally convenient framework amenable to available freeware. The marginal likelihood is tractable and can be used to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. Both simulation studies and application to a real dataset from a clinical periodontology study quantify the gain in model fit and parameter estimation over other ad hoc alternatives and provide quantitative insight into assessing the true covariate effects on the proportion responses. PMID:24764045
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
ERIC Educational Resources Information Center
Thatcher, Greg W.; Henson, Robin K.
This study examined research in training and development to determine effect size reporting practices. It focused on the reporting of corrected effect sizes in research articles using multiple regression analyses. When possible, researchers calculated corrected effect sizes and determine if the associated shrinkage could have impacted researcher…
Multiple regression models for hindcasting and forecasting midsummer hypoxia in the Gulf of Mexico.
Greene, Richard M; Lehrter, John C; Hagy, James D
2009-07-01
A new suite of multiple regression models was developed that describes relationships between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Model input variables were derived from two load estimation methods, the adjusted maximum likelihood estimation (AMLE) and the composite (COMP) method, developed by the U.S. Geological Survey. Variability in midsummer hypoxic area was described by models that incorporated May discharge, May nitrate, and February TP concentrations or their spring (discharge and nitrate) and winter (TP) averages. The regression models predicted the observed hypoxic area within +/-30%, yet model residuals showed an increasing trend with time. An additional model variable, Epoch, which allowed post-1993 observations to have a different intercept than earlier observations, suggested that hypoxic area has been 6450 km2 greater per unit discharge and nutrients since 1993. Model forecasts predicted that a dual 45% reduction in nitrate and TP concentration would likely reduce hypoxic area to approximately 5000 km2, the coastal goal established by the Mississippi River/Gulf of Mexico Watershed Nutrient Task Force. However, the COMP load estimation method, which is more accurate than the AMLE method, resulted in a smaller predicted hypoxia response to any given nutrient reduction than models based on the AMLE method. Monte Carlo simulations predicted that five years after an instantaneous 50% nitrate reduction or dual 45% nitrate and TP reduction it would be possible to resolve a significant reduction in hypoxic area. However, if nutrient reduction targets were achieved gradually (e.g., over 10 years), much more than a decade would be required before a significant downward trend in both nutrient concentrations and hypoxic area could be resolved against the large background of interannual variability. The multiple regression
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Storm Water Management Model Climate Adjustment Tool (SWMM-CAT)
The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations. SWMM, first released in 1971, models hydrology and hydrauli...
ERIC Educational Resources Information Center
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina. PMID:15128931
Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani
2016-01-01
We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings.
Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani
2016-01-01
We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings. PMID:27576054
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Catastrophe, Chaos, and Complexity Models and Psychosocial Adjustment to Disability.
ERIC Educational Resources Information Center
Parker, Randall M.; Schaller, James; Hansmann, Sandra
2003-01-01
Rehabilitation professionals may unknowingly rely on stereotypes and specious beliefs when dealing with people with disabilities, despite the formulation of theories that suggest new models of the adjustment process. Suggests that Catastrophe, Chaos, and Complexity Theories hold considerable promise in this regard. This article reviews these…
Order Effects in Belief Updating: The Belief-Adjustment Model.
ERIC Educational Resources Information Center
Hogarth, Robin M.; Einhorn, Hillel J.
1992-01-01
A theory of the updating of beliefs over time is presented that explicitly accounts for order-effect phenomena as arising from the interaction of information-processing strategies and task characteristics. The belief-adjustment model is supported by 5 experiments involving 192 adult subjects. (SLD)
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
Risk-adjusted outcome models for public mental health outpatient programs.
Hendryx, M S; Dyck, D G; Srebnik, D
1999-01-01
OBJECTIVE: To develop and test risk-adjustment outcome models in publicly funded mental health outpatient settings. We developed prospective risk models that used demographic and diagnostic variables; client-reported functioning, satisfaction, and quality of life; and case manager clinical ratings to predict subsequent client functional status, health-related quality of life, and satisfaction with services. DATA SOURCES/STUDY SETTING: Data collected from 289 adult clients at five- and ten-month intervals, from six community mental health agencies in Washington state located primarily in suburban and rural areas. Data sources included client self-report, case manager ratings, and management information system data. STUDY DESIGN: Model specifications were tested using prospective linear regression analyses. Models were validated in a separate sample and comparative agency performance examined. PRINCIPAL FINDINGS: Presence of severe diagnoses, substance abuse, client age, and baseline functional status and quality of life were predictive of mental health outcomes. Unadjusted versus risk-adjusted scores resulted in differently ranked agency performance. CONCLUSIONS: Risk-adjusted functional status and patient satisfaction outcome models can be developed for public mental health outpatient programs. Research is needed to improve the predictive accuracy of the outcome models developed in this study, and to develop techniques for use in applied settings. The finding that risk adjustment changes comparative agency performance has important consequences for quality monitoring and improvement. Issues in public mental health risk adjustment are discussed, including static versus dynamic risk models, utilization versus outcome models, choice and timing of measures, and access and quality improvement incentives. PMID:10201857
Mixed-Effects Logistic Regression Models for Indirectly Observed Discrete Outcome Variables
ERIC Educational Resources Information Center
Vermunt, Jeroen K.
2005-01-01
A well-established approach to modeling clustered data introduces random effects in the model of interest. Mixed-effects logistic regression models can be used to predict discrete outcome variables when observations are correlated. An extension of the mixed-effects logistic regression model is presented in which the dependent variable is a latent…
De Mello, Fernanda; Oliveira, Carlos A L; Ribeiro, Ricardo P; Resende, Emiko K; Povh, Jayme A; Fornari, Darci C; Barreto, Rogério V; McManus, Concepta; Streit, Danilo
2015-01-01
Was evaluated the pattern of growth among females and males of tambaqui by Gompertz nonlinear regression model. Five traits of economic importance were measured on 145 animals during the three years, totaling 981 morphometric data analyzed. Different curves were adjusted between males and females for body weight, height and head length and only one curve was adjusted to the width and body length. The asymptotic weight (a) and relative growth rate to maturity (k) were different between sexes in animals with ± 5 kg; slaughter weight practiced by a specific niche market, very profitable. However, there was no difference between males and females up to ± 2 kg; slaughter weight established to supply the bigger consumer market. Females showed weight greater than males (± 280 g), which are more suitable for fish farming purposes defined for the niche market to larger animals. In general, males had lower maximum growth rate (8.66 g / day) than females (9.34 g / day), however, reached faster than females, 476 and 486 days growth rate, respectively. The height and length body are the traits that contributed most to the weight at 516 days (P <0.001).
De Mello, Fernanda; Oliveira, Carlos A L; Ribeiro, Ricardo P; Resende, Emiko K; Povh, Jayme A; Fornari, Darci C; Barreto, Rogério V; McManus, Concepta; Streit, Danilo
2015-01-01
Was evaluated the pattern of growth among females and males of tambaqui by Gompertz nonlinear regression model. Five traits of economic importance were measured on 145 animals during the three years, totaling 981 morphometric data analyzed. Different curves were adjusted between males and females for body weight, height and head length and only one curve was adjusted to the width and body length. The asymptotic weight (a) and relative growth rate to maturity (k) were different between sexes in animals with ± 5 kg; slaughter weight practiced by a specific niche market, very profitable. However, there was no difference between males and females up to ± 2 kg; slaughter weight established to supply the bigger consumer market. Females showed weight greater than males (± 280 g), which are more suitable for fish farming purposes defined for the niche market to larger animals. In general, males had lower maximum growth rate (8.66 g / day) than females (9.34 g / day), however, reached faster than females, 476 and 486 days growth rate, respectively. The height and length body are the traits that contributed most to the weight at 516 days (P <0.001). PMID:26628036
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
ERIC Educational Resources Information Center
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
ERIC Educational Resources Information Center
Kaplan, David
2005-01-01
This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…
Hybrid hotspot detection using regression model and lithography simulation
NASA Astrophysics Data System (ADS)
Kimura, Taiki; Matsunawa, Tetsuaki; Nojima, Shigeki; Pan, David Z.
2016-03-01
As minimum feature sizes shrink, unexpected hotspots appear on wafers. Therefore, it is important to detect and fix these hotspots at design stage to reduce development time and manufacturing cost. Currently, as the most accurate approach, lithography simulation is widely used to detect such hotspots. However, it is known to be time-consuming. This paper proposes a novel aerial image synthesizing method using regression and minimum lithography simulation for only hotspot detection. Experimental results show hotspot detection on the proposed method is equivalent compared with the results on the conventional hotspot detection method which uses only lithography simulation with much less computational cost.
Comparison of multiplicative heterogeneous variance adjustment models for genetic evaluations.
Márkus, Sz; Mäntysaari, E A; Strandén, I; Eriksson, J-Å; Lidauer, M H
2014-06-01
Two heterogeneous variance adjustment methods and two variance models were compared in a simulation study. The method used for heterogeneous variance adjustment in the Nordic test-day model, which is a multiplicative method based on Meuwissen (J. Dairy Sci., 79, 1996, 310), was compared with a restricted multiplicative method where the fixed effects were not scaled. Both methods were tested with two different variance models, one with a herd-year and the other with a herd-year-month random effect. The simulation study was built on two field data sets from Swedish Red dairy cattle herds. For both data sets, 200 herds with test-day observations over a 12-year period were sampled. For one data set, herds were sampled randomly, while for the other, each herd was required to have at least 10 first-calving cows per year. The simulations supported the applicability of both methods and models, but the multiplicative mixed model was more sensitive in the case of small strata sizes. Estimation of variance components for the variance models resulted in different parameter estimates, depending on the applied heterogeneous variance adjustment method and variance model combination. Our analyses showed that the assumption of a first-order autoregressive correlation structure between random-effect levels is reasonable when within-herd heterogeneity is modelled by year classes, but less appropriate for within-herd heterogeneity by month classes. Of the studied alternatives, the multiplicative method and a variance model with a random herd-year effect were found most suitable for the Nordic test-day model for dairy cattle evaluation.
ERIC Educational Resources Information Center
Pakenham, Kenneth I.; Samios, Christina; Sofronoff, Kate
2005-01-01
The present study examined the applicability of the double ABCX model of family adjustment in explaining maternal adjustment to caring for a child diagnosed with Asperger syndrome. Forty-seven mothers completed questionnaires at a university clinic while their children were participating in an anxiety intervention. The children were aged between…
[Clinical research XX. From clinical judgment to multiple logistic regression model].
Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O
2014-01-01
The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.
Asquith, William H.; Roussel, Meghan C.
2009-01-01
Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed
Applying land use regression model to estimate spatial variation of PM₂.₅ in Beijing, China.
Wu, Jiansheng; Li, Jiacheng; Peng, Jian; Li, Weifeng; Xu, Guang; Dong, Chengcheng
2015-05-01
Fine particulate matter (PM2.5) is the major air pollutant in Beijing, posing serious threats to human health. Land use regression (LUR) has been widely used in predicting spatiotemporal variation of ambient air-pollutant concentrations, though restricted to the European and North American context. We aimed to estimate spatiotemporal variations of PM2.5 by building separate LUR models in Beijing. Hourly routine PM2.5 measurements were collected at 35 sites from 4th March 2013 to 5th March 2014. Seventy-seven predictor variables were generated in GIS, including street network, land cover, population density, catering services distribution, bus stop density, intersection density, and others. Eight LUR models were developed on annual, seasonal, peak/non-peak, and incremental concentration subsets. The annual mean concentration across all sites is 90.7 μg/m(3) (SD = 13.7). PM2.5 shows more temporal variation than spatial variation, indicating the necessity of building different models to capture spatiotemporal trends. The adjusted R (2) of these models range between 0.43 and 0.65. Most LUR models are driven by significant predictors including major road length, vegetation, and water land use. Annual outdoor exposure in Beijing is as high as 96.5 μg/m(3). This is among the first LUR studies implemented in a seriously air-polluted Chinese context, which generally produce acceptable results and reliable spatial air-pollution maps. Apart from the models for winter and incremental concentration, LUR models are driven by similar variables, suggesting that the spatial variations of PM2.5 remain steady for most of the time. Temporal variations are explained by the intercepts, and spatial variations in the measurements determine the strength of variable coefficients in our models. PMID:25487555
A sub-neighborhood scale land use regression model for predicting NO(2).
Mavko, Matthew E; Tang, Brian; George, Linda A
2008-07-15
This study set out to develop a land use regression model at sub-neighborhood scale (0.01-1 km) for Portland, Oregon using passive measurements of NO(2) at 77 locations. Variables used to develop the model included road and railroad density, traffic volume, and land use with buffers of 50 to 750 m surrounding each measurement site. An initial regression model was able to predict 66% of the variation in NO(2). Including wind direction in the regression model increased predictive power by 15%. Iterative random exclusion of 11 sites during model calibration resulted in a 3% variation in predictive power. The regression model was applied to the Portland metropolitan area using 10 m gridded land use layers. This study further validates land use regression for use in North America, and identifies important considerations for their use, such as inclusion of railways, open spaces and meteorological patterns.
Development and Evaluation of Land-Use Regression Models Using Modeled Air Quality Concentrations
Abstract Land-use regression (LUR) models have emerged as a preferred methodology for estimating individual exposure to ambient air pollution in epidemiologic studies in absence of subject-specific measurements. Although there is a growing literature focused on LUR evaluation, fu...
Beta Regression Finite Mixture Models of Polarization and Priming
ERIC Educational Resources Information Center
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
A modified GM-estimation for robust fitting of mixture regression models
NASA Astrophysics Data System (ADS)
Booppasiri, Slun; Srisodaphol, Wuttichai
2015-02-01
In the mixture regression models, the regression parameters are estimated by maximum likelihood estimation (MLE) via EM algorithm. Generally, maximum likelihood estimation is sensitive to outliers and heavy tailed error distribution. The robust method, M-estimation can handle outliers existing on dependent variable only for estimating regression coefficients in regression models. Moreover, GM-estimation can handle outliers existing on dependent variable and independent variables. In this study, the modified GM-estimations for estimating the regression coefficients in the mixture regression models are proposed. A Monte Carlo simulation is used to evaluate the efficiency of the proposed methods. The results show that the proposed modified GM-estimations approximate to MLE when there are no outliers and the error is normally distributed. Furthermore, our proposed methods are more efficient than the MLE, when there are leverage points.
Lamont, Andrea E; Vermunt, Jeroen K; Van Horn, M Lee
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the model are not directly related to latent classes. Results indicate that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. In addition, we tested whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a reanalysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
2014-01-01
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
A comparison of several regression models for analysing cost of CABG surgery.
Austin, Peter C; Ghali, William A; Tu, Jack V
2003-09-15
Investigators in clinical research are often interested in determining the association between patient characteristics and cost of medical or surgical treatment. However, there is no uniformly agreed upon regression model with which to analyse cost data. The objective of the current study was to compare the performance of linear regression, linear regression with log-transformed cost, generalized linear models with Poisson, negative binomial and gamma distributions, median regression, and proportional hazards models for analysing costs in a cohort of patients undergoing CABG surgery. The study was performed on data comprising 1959 patients who underwent CABG surgery in Calgary, Alberta, between June 1994 and March 1998. Ten of 21 patient characteristics were significantly associated with cost of surgery in all seven models. Eight variables were not significantly associated with cost of surgery in all seven models. Using mean squared prediction error as a loss function, proportional hazards regression and the three generalized linear models were best able to predict cost in independent validation data. Using mean absolute error, linear regression with log-transformed cost, proportional hazards regression, and median regression to predict median cost, were best able to predict cost in independent validation data. Since the models demonstrated good consistency in identifying factors associated with increased cost of CABG surgery, any of the seven models can be used for identifying factors associated with increased cost of surgery. However, the magnitude of, and the interpretation of, the coefficients vary across models. Researchers are encouraged to consider a variety of candidate models, including those better known in the econometrics literature, rather than begin data analysis with one regression model selected a priori. The final choice of regression model should be made after a careful assessment of how best to assess predictive ability and should be tailored to
A comparison of several regression models for analysing cost of CABG surgery.
Austin, Peter C; Ghali, William A; Tu, Jack V
2003-09-15
Investigators in clinical research are often interested in determining the association between patient characteristics and cost of medical or surgical treatment. However, there is no uniformly agreed upon regression model with which to analyse cost data. The objective of the current study was to compare the performance of linear regression, linear regression with log-transformed cost, generalized linear models with Poisson, negative binomial and gamma distributions, median regression, and proportional hazards models for analysing costs in a cohort of patients undergoing CABG surgery. The study was performed on data comprising 1959 patients who underwent CABG surgery in Calgary, Alberta, between June 1994 and March 1998. Ten of 21 patient characteristics were significantly associated with cost of surgery in all seven models. Eight variables were not significantly associated with cost of surgery in all seven models. Using mean squared prediction error as a loss function, proportional hazards regression and the three generalized linear models were best able to predict cost in independent validation data. Using mean absolute error, linear regression with log-transformed cost, proportional hazards regression, and median regression to predict median cost, were best able to predict cost in independent validation data. Since the models demonstrated good consistency in identifying factors associated with increased cost of CABG surgery, any of the seven models can be used for identifying factors associated with increased cost of surgery. However, the magnitude of, and the interpretation of, the coefficients vary across models. Researchers are encouraged to consider a variety of candidate models, including those better known in the econometrics literature, rather than begin data analysis with one regression model selected a priori. The final choice of regression model should be made after a careful assessment of how best to assess predictive ability and should be tailored to
NASA Astrophysics Data System (ADS)
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local
A Bayesian approach for the multiplicative binomial regression model
NASA Astrophysics Data System (ADS)
Paraíba, Carolina C. M.; Diniz, Carlos A. R.; Pires, Rubiane M.
2012-10-01
In the present paper, we focus our attention on Altham's multiplicative binomial model under the Bayesian perspective, modeling both the probability of success and the dispersion parameters. We present results based on a simulated data set to access the quality of Bayesian estimates and Bayesian diagnostic for model assessment.
NASA Astrophysics Data System (ADS)
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Antioch, K M; Walsh, M K
2002-01-01
Under Australian casemix funding arrangements that use Diagnosis-Related Groups (DRGs) the average price is policy based, not benchmarked. Cost weights are too low for State-wide chronic disease services. Risk-adjusted Capitation Funding Models (RACFM) are feasible alternatives. A RACFM was developed for public patients with cystic fibrosis treated by an Australian Health Maintenance Organization (AHMO). Adverse selection is of limited concern since patients pay solidarity contributions via Medicare levy with no premium contributions to the AHMO. Sponsors paying premium subsidies are the State of Victoria and the Federal Government. Cost per patient is the dependent variable in the multiple regression. Data on DRG 173 (cystic fibrosis) patients were assessed for heteroskedasticity, multicollinearity, structural stability and functional form. Stepwise linear regression excluded non-significant variables. Significant variables were 'emergency' (1276.9), 'outlier' (6377.1), 'complexity' (3043.5), 'procedures' (317.4) and the constant (4492.7) (R(2)=0.21, SE=3598.3, F=14.39, Prob<0.0001. Regression coefficients represent the additional per patient costs summed to the base payment (constant). The model explained 21% of the variance in cost per patient. The payment rate is adjusted by a best practice annual admission rate per patient. The model is a blended RACFM for in-patient, out-patient, Hospital In The Home, Fee-For-Service Federal payments for drugs and medical services; lump sum lung transplant payments and risk sharing through cost (loss) outlier payments. State and Federally funded home and palliative services are 'carved out'. The model, which has national application via Coordinated Care Trials and by Australian States for RACFMs may be instructive for Germany, which plans to use Australian DRGs for casemix funding. The capitation alternative for chronic disease can improve equity, allocative efficiency and distributional justice. The use of Diagnostic Cost
Effect of air pollution on lung cancer: a Poisson regression model based on vital statistics.
Tango, T
1994-01-01
This article describes a Poisson regression model for time trends of mortality to detect the long-term effects of common levels of air pollution on lung cancer, in which the adjustment for cigarette smoking is not always necessary. The main hypothesis to be tested in the model is that if the long-term and common-level air pollution had an effect on lung cancer, the death rate from lung cancer could be expected to increase gradually at a higher rate in the region with relatively high levels of air pollution than in the region with low levels, and that this trend would not be expected for other control diseases in which cigarette smoking is a risk factor. Using this approach, we analyzed the trend of mortality in females aged 40 to 79, from lung cancer and two control diseases, ischemic heart disease and cerebrovascular disease, based on vital statistics in 23 wards of the Tokyo metropolitan area for 1972 to 1988. Ward-specific mean levels per day of SO2 and NO2 from 1974 through 1976 estimated by Makino (1978) were used as the ward-specific exposure measure of air pollution. No data on tobacco consumption in each ward is available. Our analysis supported the existence of long-term effects of air pollution on lung cancer. PMID:7851329
A Spectral Graph Regression Model for Learning Brain Connectivity of Alzheimer’s Disease
Hu, Chenhui; Cheng, Lin; Sepulcre, Jorge; Johnson, Keith A.; Fakhri, Georges E.; Lu, Yue M.; Li, Quanzheng
2015-01-01
Understanding network features of brain pathology is essential to reveal underpinnings of neurodegenerative diseases. In this paper, we introduce a novel graph regression model (GRM) for learning structural brain connectivity of Alzheimer's disease (AD) measured by amyloid-β deposits. The proposed GRM regards 11C-labeled Pittsburgh Compound-B (PiB) positron emission tomography (PET) imaging data as smooth signals defined on an unknown graph. This graph is then estimated through an optimization framework, which fits the graph to the data with an adjustable level of uniformity of the connection weights. Under the assumed data model, results based on simulated data illustrate that our approach can accurately reconstruct the underlying network, often with better reconstruction than those obtained by both sample correlation and ℓ1-regularized partial correlation estimation. Evaluations performed upon PiB-PET imaging data of 30 AD and 40 elderly normal control (NC) subjects demonstrate that the connectivity patterns revealed by the GRM are easy to interpret and consistent with known pathology. Moreover, the hubs of the reconstructed networks match the cortical hubs given by functional MRI. The discriminative network features including both global connectivity measurements and degree statistics of specific nodes discovered from the AD and NC amyloid-beta networks provide new potential biomarkers for preclinical and clinical AD. PMID:26024224
A Negative Binomial Regression Model for Accuracy Tests
ERIC Educational Resources Information Center
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Incremental logistic regression for customizing automatic diagnostic models.
Tortajada, Salvador; Robles, Montserrat; García-Gómez, Juan Miguel
2015-01-01
In the last decades, and following the new trends in medicine, statistical learning techniques have been used for developing automatic diagnostic models for aiding the clinical experts throughout the use of Clinical Decision Support Systems. The development of these models requires a large, representative amount of data, which is commonly obtained from one hospital or a group of hospitals after an expensive and time-consuming gathering, preprocess, and validation of cases. After the model development, it has to overcome an external validation that is often carried out in a different hospital or health center. The experience is that the models show underperformed expectations. Furthermore, patient data needs ethical approval and patient consent to send and store data. For these reasons, we introduce an incremental learning algorithm base on the Bayesian inference approach that may allow us to build an initial model with a smaller number of cases and update it incrementally when new data are collected or even perform a new calibration of a model from a different center by using a reduced number of cases. The performance of our algorithm is demonstrated by employing different benchmark datasets and a real brain tumor dataset; and we compare its performance to a previous incremental algorithm and a non-incremental Bayesian model, showing that the algorithm is independent of the data model, iterative, and has a good convergence. PMID:25417079
A Noncentral "t" Regression Model for Meta-Analysis
ERIC Educational Resources Information Center
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
NASA Astrophysics Data System (ADS)
Preobrazhenskii, M. P.; Rudakov, O. B.
2016-01-01
A regression model for calculating the boiling point isobars of tetrachloromethane-organic solvent binary homogeneous systems is proposed. The parameters of the model proposed were calculated for a series of solutions. The correlation between the nonadditivity parameter of the regression model and the hydrophobicity criterion of the organic solvent is established. The parameter value of the proposed model is shown to allow prediction of the potential formation of azeotropic mixtures of solvents with tetrachloromethane.
Determination of airplane model structure from flight data by using modified stepwise regression
NASA Technical Reports Server (NTRS)
Klein, V.; Batterson, J. G.; Murphy, P. C.
1981-01-01
The linear and stepwise regressions are briefly introduced, then the problem of determining airplane model structure is addressed. The MSR was constructed to force a linear model for the aerodynamic coefficient first, then add significant nonlinear terms and delete nonsignificant terms from the model. In addition to the statistical criteria in the stepwise regression, the prediction sum of squares (PRESS) criterion and the analysis of residuals were examined for the selection of an adequate model. The procedure is used in examples with simulated and real flight data. It is shown that the MSR performs better than the ordinary stepwise regression and that the technique can also be applied to the large amplitude maneuvers.
Bertipaglia, T S; Carreño, L O D; Aspilcueta-Borquis, R R; Boligon, A A; Farah, M M; Gomes, F J; Machado, C H C; Rey, F S B; da Fonseca, R
2015-08-01
Random regression models (RRM) and multitrait models (MTM) were used to estimate genetic parameters for growth traits in Brazilian Brahman cattle and to compare the estimated breeding values obtained by these 2 methodologies. For RRM, 78,641 weight records taken between 60 and 550 d of age from 16,204 cattle were analyzed, and for MTM, the analysis consisted of 17,385 weight records taken at the same ages from 12,925 cattle. All models included the fixed effects of contemporary group and the additive genetic, maternal genetic, and animal permanent environmental effects and the quadratic effect of age at calving (AAC) as covariate. For RRM, the AAC was nested in the animal's age class. The best RRM considered cubic polynomials and the residual variance heterogeneity (5 levels). For MTM, the weights were adjusted for standard ages. For RRM, additive heritability estimates ranged from 0.42 to 0.75, and for MTM, the estimates ranged from 0.44 to 0.72 for both models at 60, 120, 205, 365, and 550 d of age. The maximum maternal heritability estimate (0.08) was at 140 d for RRM, but for MTM, it was highest at weaning (0.09). The magnitude of the genetic correlations was generally from moderate to high. The RRM adequately modeled changes in variance or covariance with age, and provided there was sufficient number of samples, increased accuracy in the estimation of the genetic parameters can be expected. Correlation of bull classifications were different in both methods and at all the ages evaluated, especially at high selection intensities, which could affect the response to selection. PMID:26440161
Propensity score-based diagnostics for categorical response regression models
Park, Sung Kyun; Vokonas, Pantel S.; Mukherjee, Bhramar
2013-01-01
For binary or categorical response models, most goodness-of-fit statistics are based on the notion of partitioning the subjects into groups or regions and comparing the observed and predicted responses in these regions by a suitable chi-squared distribution. Existing strategies create this partition based on the predicted response probabilities, or propensity scores, from the fitted model. In this paper, we follow a retrospective approach, borrowing the notion of balancing scores used in causal inference to inspect the conditional distribution of the predictors, given the propensity scores, in each category of the response to assess model adequacy. This diagnostic can be used under both prospective and retrospective sampling designs and may ascertain general forms of misspecification. We first present simple graphical and numerical summaries that can be used in a binary logistic model. We then generalize the tools to propose model diagnostics for the proportional odds model. We illustrate the methods with simulation studies and two data examples (i) a case-control study of the association between cumulative lead exposure and Parkinson’s Disease in the Boston, Massachusetts area and (ii) and a cohort study of biomarkers possibly associated with diabetes, from the VA Normative Aging Study. PMID:23934948
ERIC Educational Resources Information Center
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…
KUPPER, Lawrence L.
2012-01-01
A common goal in environmental epidemiologic studies is to undertake logistic regression modeling to associate a continuous measure of exposure with binary disease status, adjusting for covariates. A frequent complication is that exposure may only be measurable indirectly, through a collection of subject-specific variables assumed associated with it. Motivated by a specific study to investigate the association between lung function and exposure to metal working fluids, we focus on a multiplicative-lognormal structural measurement error scenario and approaches to address it when external validation data are available. Conceptually, we emphasize the case in which true untransformed exposure is of interest in modeling disease status, but measurement error is additive on the log scale and thus multiplicative on the raw scale. Methodologically, we favor a pseudo-likelihood (PL) approach that exhibits fewer computational problems than direct full maximum likelihood (ML) yet maintains consistency under the assumed models without necessitating small exposure effects and/or small measurement error assumptions. Such assumptions are required by computationally convenient alternative methods like regression calibration (RC) and ML based on probit approximations. We summarize simulations demonstrating considerable potential for bias in the latter two approaches, while supporting the use of PL across a variety of scenarios. We also provide accessible strategies for obtaining adjusted standard errors to accompany RC and PL estimates. PMID:24027381
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Mohammed, Mohammed A; Manktelow, Bradley N; Hofer, Timothy P
2016-04-01
There is interest in deriving case-mix adjusted standardised mortality ratios so that comparisons between healthcare providers, such as hospitals, can be undertaken in the controversial belief that variability in standardised mortality ratios reflects quality of care. Typically standardised mortality ratios are derived using a fixed effects logistic regression model, without a hospital term in the model. This fails to account for the hierarchical structure of the data - patients nested within hospitals - and so a hierarchical logistic regression model is more appropriate. However, four methods have been advocated for deriving standardised mortality ratios from a hierarchical logistic regression model, but their agreement is not known and neither do we know which is to be preferred. We found significant differences between the four types of standardised mortality ratios because they reflect a range of underlying conceptual issues. The most subtle issue is the distinction between asking how an average patient fares in different hospitals versus how patients at a given hospital fare at an average hospital. Since the answers to these questions are not the same and since the choice between these two approaches is not obvious, the extent to which profiling hospitals on mortality can be undertaken safely and reliably, without resolving these methodological issues, remains questionable.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles
Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila
2009-06-28
This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.
Nationwide regression models for predicting urban runoff water quality at unmonitored sites
Tasker, Gary D.; Driver, N.E.
1988-01-01
Regression models are presented that can be used to estimate mean loads for chemical oxygen demand, suspended solids, dissolved solids, total nitrogen, total ammonia plus nitrogen, total phosphorous, dissolved phosphorous, total copper, total lead, and total zinc at unmonitored sites in urban areas. Explanatory variables include drainage area, imperviousness of drainage basin to infiltration, mean annual rainfall, a land-use indicator variable, and mean minimum January temperature. Model parameters are estimated by a generalized-least-squares regression method that accounts for cross correlation and differences in reliability of sample estimates between sites. The regression models account for 20 to 65 percent of the total variation in observed loads.
Bayesian regression model for seasonal forecast of precipitation over Korea
NASA Astrophysics Data System (ADS)
Jo, Seongil; Lim, Yaeji; Lee, Jaeyong; Kang, Hyun-Suk; Oh, Hee-Seok
2012-08-01
In this paper, we apply three different Bayesian methods to the seasonal forecasting of the precipitation in a region around Korea (32.5°N-42.5°N, 122.5°E-132.5°E). We focus on the precipitation of summer season (June-July-August; JJA) for the period of 1979-2007 using the precipitation produced by the Global Data Assimilation and Prediction System (GDAPS) as predictors. Through cross-validation, we demonstrate improvement for seasonal forecast of precipitation in terms of root mean squared error (RMSE) and linear error in probability space score (LEPS). The proposed methods yield RMSE of 1.09 and LEPS of 0.31 between the predicted and observed precipitations, while the prediction using GDAPS output only produces RMSE of 1.20 and LEPS of 0.33 for CPC Merged Analyzed Precipitation (CMAP) data. For station-measured precipitation data, the RMSE and LEPS of the proposed Bayesian methods are 0.53 and 0.29, while GDAPS output is 0.66 and 0.33, respectively. The methods seem to capture the spatial pattern of the observed precipitation. The Bayesian paradigm incorporates the model uncertainty as an integral part of modeling in a natural way. We provide a probabilistic forecast integrating model uncertainty.
Aboveground biomass and carbon stocks modelling using non-linear regression model
NASA Astrophysics Data System (ADS)
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
NASA Astrophysics Data System (ADS)
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting
NASA Astrophysics Data System (ADS)
Žliobaitė, I.; Hollmén, J.; Junninen, H.
2014-12-01
Statistical models for environmental monitoring strongly rely on automatic data acquisition systems that use various physical sensors. Often, sensor readings are missing for extended periods of time, while model outputs need to be continuously available in real time. With a case study in solar-radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable) in such situations. Our goal is to analyze characteristics of missing data and recommend a strategy for deploying regression models which would be robust to missing data in situations where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods and resort to a mean replacement, accompanied with a robust regression model. We use an established strategy for assessing different regression models and for determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze the accuracies and robustness to missing data of seven linear regression models. We recommend using the regularized PCA regression with our established guideline in training regression models, which themselves are robust to missing data.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
An interface model for dosage adjustment connects hematotoxicity to pharmacokinetics.
Meille, C; Iliadis, A; Barbolosi, D; Frances, N; Freyer, G
2008-12-01
When modeling is required to describe pharmacokinetics and pharmacodynamics simultaneously, it is difficult to link time-concentration profiles and drug effects. When patients are under chemotherapy, despite the huge amount of blood monitoring numerations, there is a lack of exposure variables to describe hematotoxicity linked with the circulating drug blood levels. We developed an interface model that transforms circulating pharmacokinetic concentrations to adequate exposures, destined to be inputs of the pharmacodynamic process. The model is materialized by a nonlinear differential equation involving three parameters. The relevance of the interface model for dosage adjustment is illustrated by numerous simulations. In particular, the interface model is incorporated into a complex system including pharmacokinetics and neutropenia induced by docetaxel and by cisplatin. Emphasis is placed on the sensitivity of neutropenia with respect to the variations of the drug amount. This complex system including pharmacokinetic, interface, and pharmacodynamic hematotoxicity models is an interesting tool for analysis of hematotoxicity induced by anticancer agents. The model could be a new basis for further improvements aimed at incorporating new experimental features. PMID:19107581
Modeling absolute differences in life expectancy with a censored skew-normal regression approach
Clough-Gorr, Kerri; Zwahlen, Marcel
2015-01-01
Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
Evaluation of Land use Regression Models for NO2 in El Paso, Texas, USA
Developing suitable exposure estimates for air pollution health studies is problematic due to spatial and temporal variation in concentrations and often limited monitoring data. Though land use regression models (LURs) are often used for this purpose, their applicability to later...
Technology Transfer Automated Retrieval System (TEKTRAN)
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in heat production, or energy expenditure (EE). Multivariate adaptive regression splines (MARS), is a nonparametric method that estimates complex nonlinear relationships by a seri...
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO
A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
SCI model structure determination program (OSR) user's guide. [optimal subset regression
NASA Technical Reports Server (NTRS)
1979-01-01
The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.
Cooley, R.L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: 1) prior information having known reliability (that is, bias and random error structure), and 2) prior information consisting of best available estimates of unknown reliability. It is shown that if both scales of prior information are available, then a combined regression analysis may be made. -from Author
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Mispricing in the medicare advantage risk adjustment model.
Chen, Jing; Ellis, Randall P; Toro, Katherine H; Ash, Arlene S
2015-01-01
The Centers for Medicare and Medicaid Services (CMS) implemented hierarchical condition category (HCC) models in 2004 to adjust payments to Medicare Advantage (MA) plans to reflect enrollees' expected health care costs. We use Verisk Health's diagnostic cost group (DxCG) Medicare models, refined "descendants" of the same HCC framework with 189 comprehensive clinical categories available to CMS in 2004, to reveal 2 mispricing errors resulting from CMS' implementation. One comes from ignoring all diagnostic information for "new enrollees" (those with less than 12 months of prior claims). Another comes from continuing to use the simplified models that were originally adopted in response to assertions from some capitated health plans that submitting the claims-like data that facilitate richer models was too burdensome. Even the main CMS model being used in 2014 recognizes only 79 condition categories, excluding many diagnoses and merging conditions with somewhat heterogeneous costs. Omitted conditions are typically lower cost or "vague" and not easily audited from simplified data submissions. In contrast, DxCG Medicare models use a comprehensive, 394-HCC classification system. Applying both models to Medicare's 2010-2011 fee-for-service 5% sample, we find mispricing and lower predictive accuracy for the CMS implementation. For example, in 2010, 13% of beneficiaries had at least 1 higher cost DxCG-recognized condition but no CMS-recognized condition; their 2011 actual costs averaged US$6628, almost one-third more than the CMS model prediction. As MA plans must now supply encounter data, CMS should consider using more refined and comprehensive (DxCG-like) models.
Thomas, Michael S C; Knowland, Victoria C P; Karmiloff-Smith, Annette
2011-10-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by overaggressive synaptic pruning and identifying the mechanisms involved. We used a novel population-modeling technique to investigate developmental deficits, in which both neurocomputational parameters and the learning environment were varied across a large number of simulated individuals. Regression was generated by the atypical setting of a single pruning-related parameter. We observed a probabilistic relationship between the atypical pruning parameter and the presence of regression, as well as variability in the onset, severity, behavioral specificity, and recovery from regression. Other neurocomputational parameters that varied across the population modulated the risk that an individual would show regression. We considered a further hypothesis that behavioral regression may index an underlying anomaly characterizing the broader autism phenotype. If this is the case, we show how the model also accounts for several additional findings: shared gene variants between autism and language impairment (Vernes et al., 2008); larger brain size in autism but only in early development (Redcay & Courchesne, 2005); and the possibility of quasi-autism, caused by extreme environmental deprivation (Rutter et al., 1999). We make a novel prediction that the earliest developmental symptoms in the emergence of autism should be sensory and motor rather than social and review empirical data offering preliminary support for this prediction.
Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S
2014-07-01
In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-01-01
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2014-05-01
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities. PMID:24338099
Aguilera, Inmaculada; Foraster, Maria; Basagaña, Xavier; Corradi, Elisabetta; Deltell, Alexandre; Morelli, Xavier; Phuleria, Harish C; Ragettli, Martina S; Rivera, Marcela; Thomasson, Alexandre; Slama, Rémy; Künzli, Nino
2015-01-01
Noise prediction models and noise maps are used to estimate the exposure to road traffic noise, but their availability and the quality of the noise estimates is sometimes limited. This paper explores the application of land use regression (LUR) modelling to assess the long-term intraurban spatial variability of road traffic noise in three European cities. Short-term measurements of road traffic noise taken in Basel, Switzerland (n=60), Girona, Spain (n=40), and Grenoble, France (n=41), were used to develop two LUR models: (a) a "GIS-only" model, which considered only predictor variables derived with Geographic Information Systems; and (b) a "Best" model, which in addition considered the variables collected while visiting the measurement sites. Both noise measurements and noise estimates from LUR models were compared with noise estimates from standard noise models developed for each city by the local authorities. Model performance (adjusted R(2)) was 0.66-0.87 for "GIS-only" models, and 0.70-0.89 for "Best" models. Short-term noise measurements showed a high correlation (r=0.62-0.78) with noise estimates from the standard noise models. LUR noise estimates did not show any systematic differences in the spatial patterns when compared with those from standard noise models. LUR modelling with accurate GIS source data can be a promising tool for noise exposure assessment with applications in epidemiological studies.
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert M.
2013-01-01
A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
NASA Astrophysics Data System (ADS)
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Towards accurate observation and modelling of Antarctic glacial isostatic adjustment
NASA Astrophysics Data System (ADS)
King, M.
2012-04-01
The response of the solid Earth to glacial mass changes, known as glacial isostatic adjustment (GIA), has received renewed attention in the recent decade thanks to the Gravity Recovery and Climate Experiment (GRACE) satellite mission. GRACE measures Earth's gravity field every 30 days, but cannot partition surface mass changes, such as present-day cryospheric or hydrological change, from changes within the solid Earth, notably due to GIA. If GIA cannot be accurately modelled in a particular region the accuracy of GRACE estimates of ice mass balance for that region is compromised. This lecture will focus on Antarctica, where models of GIA are hugely uncertain due to weak constraints on ice loading history and Earth structure. Over the last years, however, there has been a step-change in our ability to measure GIA uplift with the Global Positioning System (GPS), including widespread deployments of permanent GPS receivers as part of the International Polar Year (IPY) POLENET project. I will particularly focus on the Antarctic GPS velocity field and the confounding effect of elastic rebound due to present-day ice mass changes, and then describe the construction and calibration of a new Antarctic GIA model for application to GRACE data, as well as highlighting areas where further critical developments are required.
Regression models based on new local strategies for near infrared spectroscopic data.
Allegrini, F; Fernández Pierna, J A; Fragoso, W D; Olivieri, A C; Baeten, V; Dardenne, P
2016-08-24
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases. PMID:27496996
Disaster Hits Home: A Model of Displaced Family Adjustment after Hurricane Katrina
ERIC Educational Resources Information Center
Peek, Lori; Morrissey, Bridget; Marlatt, Holly
2011-01-01
The authors explored individual and family adjustment processes among parents (n = 30) and children (n = 55) who were displaced to Colorado after Hurricane Katrina. Drawing on in-depth interviews with 23 families, this article offers an inductive model of displaced family adjustment. Four stages of family adjustment are presented in the model: (a)…
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
ERIC Educational Resources Information Center
Tay, Louis; Drasgow, Fritz
2012-01-01
Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…
Guo, Chao-Yu; Chen, Yu-Jing; Chen, Yi-Hau
2014-07-01
One of the greatest challenges in genetic studies is the determination of gene-environment interactions due to underlying complications and inadequate statistical power. With the increased sample size gained by using case-parent trios and unrelated cases and controls, the performance may be much improved. Focusing on a dichotomous trait, a two-stage approach was previously proposed to deal with gene-environment interaction when utilizing mixed study samples. Theoretically, the two-stage association analysis uses likelihood functions such that the computational algorithms may not converge in the maximum likelihood estimation with small study samples. In an effort to avoid such convergence issues, we propose a logistic regression framework model, based on the combined haplotype relative risk (CHRR) method, which intuitively pools the case-parent trios and unrelated subjects in a two by two table. A positive feature of the logistic regression model is the effortless adjustment for either discrete or continuous covariates. According to computer simulations, under the circumstances in which the two-stage test converges in larger sample sizes, we discovered that the performances of the two tests were quite similar; the two-stage test is more powerful under the dominant and additive disease models, but the extended CHRR is more powerful under the recessive disease model. PMID:24766627
Preisser, J. S.; Phillips, C.; Perin, J.; Schwartz, T. A.
2011-01-01
Objectives The article reviews proportional and partial proportional odds regression for ordered categorical outcomes, such as patient-reported measures, that are frequently used in clinical research in dentistry. Methods The proportional odds regression model for ordinal data is a generalization of ordinary logistic regression for dichotomous responses. When the proportional odds assumption holds for some but not all of the covariates, the lesser known partial proportional odds model is shown to provide a useful extension. Results The ordinal data models are illustrated for the analysis of repeated ordinal outcomes to determine whether the burden associated with sensory alteration following a bilateral sagittal split osteotomy procedure differed for those patients who were given opening exercises only following surgery and those who received sensory retraining exercises in conjunction with standard opening exercises. Conclusions Proportional and partial proportional odds models are broadly applicable to the analysis of cross-sectional and longitudinal ordinal data in dental research. PMID:21070317
Regression model estimation of early season crop proportions: North Dakota, some preliminary results
NASA Technical Reports Server (NTRS)
Lin, K. K. (Principal Investigator)
1982-01-01
To estimate crop proportions early in the season, an approach is proposed based on: use of a regression-based prediction equation to obtain an a priori estimate for specific major crop groups; modification of this estimate using current-year LANDSAT and weather data; and a breakdown of the major crop groups into specific crops by regression models. Results from the development and evaluation of appropriate regression models for the first portion of the proposed approach are presented. The results show that the model predicts 1980 crop proportions very well at both county and crop reporting district levels. In terms of planted acreage, the model underpredicted 9.1 percent of the 1980 published data on planted acreage at the county level. It predicted almost exactly the 1980 published data on planted acreage at the crop reporting district level and overpredicted the planted acreage by just 0.92 percent.
Nonlinear logistic regression model for outcomes after endourologic procedures: a novel predictor.
Kadlec, Adam O; Ohlander, Samuel; Hotaling, James; Hannick, Jessica; Niederberger, Craig; Turk, Thomas M
2014-08-01
The purpose of this study was to design a thorough and practical nonlinear logistic regression model that can be used for outcome prediction after various forms of endourologic intervention. Input variables and outcome data from 382 renal units endourologically treated at a single institution were used to build and cross-validate an independently designed nonlinear logistic regression model. Model outcomes were stone-free status and need for a secondary procedure. The model predicted stone-free status with sensitivity 75.3% and specificity 60.4%, yielding a positive predictive value (PPV) of 75.3% and negative predictive value (NPV) of 60.4%, with classification accuracy of 69.6%. Receiver operating characteristic area under the curve (ROC AUC) was 0.749. The model predicted the need for a secondary procedure with sensitivity 30% and specificity 98.3%, yielding a PPV of 60% and NPV of 94.2%. ROC AUC was 0.863. The model had equivalent predictive value to a traditional logistic regression model for the secondary procedure outcome. This study is proof-of-concept that a nonlinear regression model adequately predicts key clinical outcomes after shockwave lithotripsy, ureteroscopic lithotripsy, and percutaneous nephrolithotomy. This model holds promise for further optimization via dataset expansion, preferably with multi-institutional data, and could be developed into a predictive nomogram in the future.
Alley, William M.
1986-01-01
Problems involving the combined use of contaminant transport models and nonlinear optimization schemes can be very expensive to solve. This paper explores the use of transport models with ordinary regression and regression on ranks to develop approximate response functions of concentrations at critical locations as a function of pumping and recharge at decision wells. These response functions combined with other constraints can often be solved very easily and may suggest reasonable starting points for combined simulation-management modeling or even relatively efficient operating schemes in themselves.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis
NASA Technical Reports Server (NTRS)
Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)
1979-01-01
The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
Adjusting the Census of 1990: The Smoothing Model.
ERIC Educational Resources Information Center
Freedman, David A.; And Others
1993-01-01
Techniques for adjusting census figures are discussed, with a focus on sampling error, uncertainty of estimates resulting from the luck of sample choice. Computer simulations illustrate the ways in which the smoothing algorithm may make adjustments less, rather than more, accurate. (SLD)
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
ERIC Educational Resources Information Center
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
ERIC Educational Resources Information Center
Echternacht, Gary; Swinton, Spencer
Title I evaluations using the RMC Model C design depend for their interpretation on the assumption that the regression of posttest on pretest is linear across the cut score level when there is no treatment; but there are many instances where nonlinearities may occur. If one applies the analysis of covariance, or model C analysis, large errors may…
An Additional Measure of Overall Effect Size for Logistic Regression Models
ERIC Educational Resources Information Center
Allen, Jeff; Le, Huy
2008-01-01
Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model's predictors. Analogs of R[superscript 2] have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R[superscript 2] analogs are not invariant to the…
ERIC Educational Resources Information Center
Carney, Michelle Mohr; Buttell, Frederick P.; Muldoon, John
2006-01-01
The purpose of this study was to (1) create a predictive model that would correctly identify men at greatest risk of dropping out of a court-mandated, batterer intervention program; and, (2) explore the creation of such a logistic regression model using a set of instruments that were different from those used in previous research. Method: The…
Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis
ERIC Educational Resources Information Center
Azen, Razia; Budescu, David V.
2006-01-01
Dominance analysis (DA) is a method used to compare the relative importance of predictors in multiple regression. DA determines the dominance of one predictor over another by comparing their additional R[squared] contributions across all subset models. In this article DA is extended to multivariate models by identifying a minimal set of criteria…
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
The role of a murine transplantation model of atherosclerosis regression in drug discovery
Feig, Jonathan E; Quick, John S
2015-01-01
Atherosclerosis is the leading cause of death worldwide. To date, the use of statins to lower LDL levels has been the major intervention used to delay or halt disease progression. These drugs have an incomplete impact on plaque burden and risk, however, as evidenced by the substantial rates of myocardial infarctions that occur in large-scale clinical trials of statins. Thus, it is hoped that by understanding the factors that lead to plaque regression, better approaches to treating atherosclerosis may be developed. A transplantation-based mouse model of atherosclerosis regression has been developed by allowing plaques to form in a model of human atherosclerosis, the apoE-deficient mouse, and then placing these plaques into recipient mice with a normolipidemic plasma environment. Under these conditions, the depletion of foam cells occurs. Interestingly, the disappearance of foam cells was primarily due to migration in a CCR7-dependent manner to regional and systemic lymph nodes after 3 days in the normolipidemic (regression) environment. Further studies using this transplant model demonstrated that liver X receptor and HDL are other factors likely to be involved in plaque regression. In conclusion, through the use of this transplant model, the process of uncovering the pathways regulating atherosclerosis regression has begun, which will ultimately lead to the identification of new therapeutic targets. PMID:19333880
Arora, Amarpreet S; Reddy, Akepati S
2014-01-01
Stormwater management at urban sub-watershed level has been envisioned to include stormwater collection, treatment, and disposal of treated stormwater through groundwater recharging. Sizing, operation and control of the stormwater management systems require information on the quantities and characteristics of the stormwater generated. Stormwater characteristics depend upon dry spell between two successive rainfall events, intensity of rainfall and watershed characteristics. However, sampling and analysis of stormwater, spanning only few rainfall events, provides insufficient information on the characteristics. An attempt has been made in the present study to assess the stormwater characteristics through regression modeling. Stormwater of five sub-watersheds of Patiala city were sampled and analyzed. The results obtained were related with the antecedent dry periods and with the intensity of the rainfall event through regression modeling. Obtained regression models were used to assess the stormwater quality for various antecedent dry periods and rainfall event intensities.
Larson, S.J.; Gilliom, R.J.
2001-01-01
Regression models were developed for estimating stream concentrations of the herbicides alachlor, atrazine, cyanazine, metolachlor, and trifluralin from use-intensity data and watershed characteristics. Concentrations were determined from samples collected from 45 streams throughout the United States during 1993 to 1995 as part of the U.S. Geological Survey's National Water-Quality Assessment (NAWQA). Separate regression models were developed for each of six percentiles (10th, 25th, 50th, 75th, 90th, 95th) of the annual distribution of stream concentrations and for the annual time-weighted mean concentration. Estimates for the individual percentiles can be combined to provide an estimate of the annual distribution of concentrations for a given stream. Agricultural use of the herbicide in the watershed was a significant predictor in nearly all of the models. Several hydrologic and soil parameters also were useful in explaining the variability in concentrations of herbicides among the streams. Most of the regression models developed for estimation of concentration percentiles and annual mean concentrations accounted for 50 percent to 90 percent of the variability among streams. Predicted concentrations were nearly always within an order of magnitude of the measured concentrations for the model-development streams, and predicted concentration distributions reasonably matched the actual distributions in most cases. Results from application of the models to streams not included in the model development data set are encouraging, but further validation of the regression approach described in this paper is needed.
Austin, Peter C; Steyerberg, Ewout W
2014-02-10
Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack-of-fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess-based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess-based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess-based smoothing methods are adequate tools to graphically assess calibration and merit wider application.
A General Semiparametric Hazards Regression Model: Efficient Estimation and Structure Selection
Tong, Xingwei; Zhu, Liang; Leng, Chenlei; Leisenring, Wendy; Robison, Leslie L.
2014-01-01
We consider a general semiparametric hazards regression model that encompasses Cox’s proportional hazards model and the accelerated failure time model for survival analysis. To overcome the nonexistence of the maximum likelihood, we derive a kernel-smoothed profile likelihood function, and prove that the resulting estimates of the regression parameters are consistent and achieve semiparametric efficiency. In addition, we develop penalized structure selection techniques to determine which covariates constitute the accelerate failure time model and which covariates constitute the proportional hazards model. The proposed method is able to estimate the model structure consistently and model parameters efficiently. Furthermore, variance estimation is straightforward. The proposed estimation performs well in simulation studies and is applied to the analysis of a real data set. Copyright PMID:23824784
Nagai, Mika; Konno, Yoshihiro; Satsukawa, Masahiro; Yamashita, Shinji; Yoshinari, Kouichi
2016-08-01
Drug-drug interactions (DDIs) via cytochrome P450 (P450) induction are one clinical problem leading to increased risk of adverse effects and the need for dosage adjustments and additional therapeutic monitoring. In silico models for predicting P450 induction are useful for avoiding DDI risk. In this study, we have established regression models for CYP3A4 and CYP2B6 induction in human hepatocytes using several physicochemical parameters for a set of azole compounds with different P450 induction as characteristics as model compounds. To obtain a well-correlated regression model, the compounds for CYP3A4 or CYP2B6 induction were independently selected from the tested azole compounds using principal component analysis with fold-induction data. Both of the multiple linear regression models obtained for CYP3A4 and CYP2B6 induction are represented by different sets of physicochemical parameters. The adjusted coefficients of determination for these models were of 0.8 and 0.9, respectively. The fold-induction of the validation compounds, another set of 12 azole-containing compounds, were predicted within twofold limits for both CYP3A4 and CYP2B6. The concordance for the prediction of CYP3A4 induction was 87% with another validation set, 23 marketed drugs. However, the prediction of CYP2B6 induction tended to be overestimated for these marketed drugs. The regression models show that lipophilicity mostly contributes to CYP3A4 induction, whereas not only the lipophilicity but also the molecular polarity is important for CYP2B6 induction. Our regression models, especially that for CYP3A4 induction, might provide useful methods to avoid potent CYP3A4 or CYP2B6 inducers during the lead optimization stage without performing induction assays in human hepatocytes.
Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
Kaiser, J C; Heidenreich, W F
2004-11-15
In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model. PMID:15490436
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
NASA Astrophysics Data System (ADS)
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Liang, Hua
2008-01-01
Differential equation (DE) models are widely used in many scientific fields that include engineering, physics and biomedical sciences. The so-called “forward problem”, the problem of simulations and predictions of state variables for given parameter values in the DE models, has been extensively studied by mathematicians, physicists, engineers and other scientists. However, the “inverse problem”, the problem of parameter estimation based on the measurements of output variables, has not been well explored using modern statistical methods, although some least squares-based approaches have been proposed and studied. In this paper, we propose parameter estimation methods for ordinary differential equation models (ODE) based on the local smoothing approach and a pseudo-least squares (PsLS) principle under a framework of measurement error in regression models. The asymptotic properties of the proposed PsLS estimator are established. We also compare the PsLS method to the corresponding SIMEX method and evaluate their finite sample performances via simulation studies. We illustrate the proposed approach using an application example from an HIV dynamic study. PMID:19956350
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. PMID:26530819
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Regression based modeling of vegetation and climate variables for the Amazon rainforests
NASA Astrophysics Data System (ADS)
Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.
2015-12-01
Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. PMID:24620829
NASA Astrophysics Data System (ADS)
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
NASA Astrophysics Data System (ADS)
Hill, D.; Bell, K. R. W.; McMillan, D.; Infield, D.
2014-05-01
The growth of wind power production in the electricity portfolio is striving to meet ambitious targets set, for example by the EU, to reduce greenhouse gas emissions by 20% by 2020. Huge investments are now being made in new offshore wind farms around UK coastal waters that will have a major impact on the GB electrical supply. Representations of the UK wind field in syntheses which capture the inherent structure and correlations between different locations including offshore sites are required. Here, Vector Auto-Regressive (VAR) models are presented and extended in a novel way to incorporate offshore time series from a pan-European meteorological model called COSMO, with onshore wind speeds from the MIDAS dataset provided by the British Atmospheric Data Centre. Forecasting ability onshore is shown to be improved with the inclusion of the offshore sites with improvements of up to 25% in RMS error at 6 h ahead. In addition, the VAR model is used to synthesise time series of wind at each offshore site, which are then used to estimate wind farm capacity factors at the sites in question. These are then compared with estimates of capacity factors derived from the work of Hawkins et al. (2011). A good degree of agreement is established indicating that this synthesis tool should be useful in power system impact studies.
León, Larry F; Cai, Tianxi
2012-04-01
In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. PMID:26720396
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.
ERIC Educational Resources Information Center
Fanning, Fred; Newman, Isadore
Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
An INAR(1) Negative Multinomial Regression Model for Longitudinal Count Data.
ERIC Educational Resources Information Center
Bockenholt, Ulf
1999-01-01
Discusses a regression model for the analysis of longitudinal count data in a panel study by adapting an integer-valued first-order autoregressive (INAR(1)) Poisson process to represent time-dependent correlation between counts. Derives a new negative multinomial distribution by combining INAR(1) representation with a random effects approach.…
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Not Quite Normal: Consequences of Violating the Assumption of Normality in Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Smith, Jessalyn; Fagan, Abigail A.; Jaki, Thomas; Feaster, Daniel J.; Masyn, Katherine; Hawkins, J. David; Howe, George
2012-01-01
Regression mixture models, which have only recently begun to be used in applied research, are a new approach for finding differential effects. This approach comes at the cost of the assumption that error terms are normally distributed within classes. This study uses Monte Carlo simulations to explore the effects of relatively minor violations of…
ERIC Educational Resources Information Center
Chen, Chau-Kuang; Hughes. John, Jr.
2004-01-01
The ordinal regression method was used to model the relationship between the ordinal outcome variable, e.g., different levels of student satisfaction regarding the overall college experience, and the explanatory variables concerning demographics and student learning environment in a predominantly minority health sciences center. The outcome…
The prediction of intelligence in preschool children using alternative models to regression.
Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E
2011-12-01
Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
REGRESSION MODELS THAT RELATE STREAMS TO WATERSHEDS: COPING WITH NUMEROUS, COLLINEAR PEDICTORS
GIS efforts can produce a very large number of watershed variables (climate, land use/land cover and topography, all defined for multiple areas of influence) that could serve as candidate predictors in a regression model of reach-scale stream features. Invariably, many of these ...
ERIC Educational Resources Information Center
Cason, Gerald J.; Cason, Carolyn L.
A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
NASA Astrophysics Data System (ADS)
Gaeta, Alessandra; Cattani, Giorgio; Di Menno di Bucchianico, Alessandro; De Santis, Antonella; Cesaroni, Giulia; Badaloni, Chiara; Ancona, Carla; Forastiere, Francesco; Sozzi, Roberto; Bolignano, Andrea; Sacco, Fabrizio
2016-04-01
The aim of this study was to evaluate the small scale spatial variability of nitrogen dioxide (NO2) and selected VOCs (benzene, toluene, acrolein and formaldehyde) concentrations using Land Use Regression models (LURs) in a complex multi sources domain (64 km2), containing a mid-size airport: the Ciampino Airport, located in Ciampino, Rome, Italy. 46 diffusion tube samplers were deployed within a domain centred in the airport over two 2-weekly periods (June 2011-January 2012). GIS-derived predictor variables, with varying buffer size, were evaluated to model spatial variation of NO2, benzene, toluene, formaldehyde and acrolein annual average concentrations. The airport apportionment to air quality was investigated using a Lagrangian dispersion model (SPRAY). A stepwise selection procedure was used to develop the linear regression models. The models were validated using leave one out cross validation (LOOCV) method. In this study, the use of LURs was found to be effective to explain spatial variability of NO2 (adjusted-R2 = 0.72), benzene (adjusted-R2 = 0.53), toluene (adjusted-R2 = 0.50) and acrolein (adjusted-R2 = 0.51), while limited power was achieved with the formaldehyde modeling (adjusted-R2 = 0.24). For all pollutants LURs output showed that the small scale spatial variability was mainly explained by local traffic. The airport contribution to the observed spatial variability was adequately quantified only for acrolein (0.43 (±0.69) μg/m3 in an area of about 6 km2, SW located to the airport runway), while for NO2 and formaldehyde, only a little portion of the spatial variability in a limited portion of the study domain was attributable to airport related emissions.
Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data
Mi, Gu; Di, Yanming; Schafer, Daniel W.
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models. PMID:25787144
Evaluation and application of regional turbidity-sediment regression models in Virginia
Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James; Chanat, Jeffrey G.
2015-01-01
Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.
Regressions by leaps and bounds and biased estimation techniques in yield modeling
NASA Technical Reports Server (NTRS)
Marquina, N. E. (Principal Investigator)
1979-01-01
The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.
Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Hopkins, Dale A.
2002-01-01
In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
NASA Astrophysics Data System (ADS)
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
A Multilevel Regression Model for Geographical Studies in Sets of Non-Adjacent Cities
Marí-Dell’Olmo, Marc; Martínez-Beneito, Miguel Ángel
2015-01-01
In recent years, small-area-based ecological regression analyses have been published that study the association between a health outcome and a covariate in several cities. These analyses have usually been performed independently for each city and have therefore yielded unrelated estimates for the cities considered, even though the same process has been studied in all of them. In this study, we propose a joint ecological regression model for multiple cities that accounts for spatial structure both within and between cities and explore the advantages of this model. The proposed model merges both disease mapping and geostatistical ideas. Our proposal is compared with two alternatives, one that models the association for each city as fixed effects and another that treats them as independent and identically distributed random effects. The proposed model allows us to estimate the association (and assess its significance) at locations with no available data. Our proposal is illustrated by an example of the association between unemployment (as a deprivation surrogate) and lung cancer mortality among men in 31 Spanish cities. In this example, the associations found were far more accurate for the proposed model than those from the fixed effects model. Our main conclusion is that ecological regression analyses can be markedly improved by performing joint analyses at several locations that share information among them. This finding should be taken into consideration in the design of future epidemiological studies. PMID:26308613
NASA Astrophysics Data System (ADS)
Urrutia, J. D.; Bautista, L. A.; Baccay, E. B.
2014-04-01
The aim of this study was to develop mathematical models for estimating earthquake casualties such as death, number of injured persons, affected families and total cost of damage. To quantify the direct damages from earthquakes to human beings and properties given the magnitude, intensity, depth of focus, location of epicentre and time duration, the regression models were made. The researchers formulated models through regression analysis using matrices and used α = 0.01. The study considered thirty destructive earthquakes that hit the Philippines from the inclusive years 1968 to 2012. Relevant data about these said earthquakes were obtained from Philippine Institute of Volcanology and Seismology. Data on damages and casualties were gathered from the records of National Disaster Risk Reduction and Management Council. The mathematical models made are as follows: This study will be of great value in emergency planning, initiating and updating programs for earthquake hazard reductionin the Philippines, which is an earthquake-prone country.
Age estimation based on pelvic ossification using regression models from conventional radiography.
Zhang, Kui; Dong, Xiao-Ai; Fan, Fei; Deng, Zhen-Hua
2016-07-01
To establish regression models for age estimation from the combination of the ossification of iliac crest and ischial tuberosity. One thousand three hundred and seventy-nine conventional pelvic radiographs at the West China Hospital of Sichuan University between January 2010 and June 2012 were evaluated retrospectively. The receiver operating characteristic analysis was performed to measure the value of estimation of 18 years of age with the classification scheme for the iliac crest and ischial tuberosity. Regression analysis was performed, and formulas for calculating approximate chronological age according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity were developed. The areas under the receiver operating characteristic (ROC) curves were above 0.9 (p < 0.001), indicating a good prediction of the grading systems, and the cubic regression model was found to have the highest R-square value (R (2) = 0.744 for female and R (2) = 0.753 for male). The present classification scheme for apophyseal iliac crest ossification and the ischial tuberosity may be used for age estimation. And the present established cubic regression model according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity can be used for age estimation. PMID:27169673
Application of Dynamic Grey-Linear Auto-regressive Model in Time Scale Calculation
NASA Astrophysics Data System (ADS)
Yuan, H. T.; Don, S. W.
2009-01-01
Because of the influence of different noise and the other factors, the running of an atomic clock is very complex. In order to forecast the velocity of an atomic clock accurately, it is necessary to study and design a model to calculate its velocity in the near future. By using the velocity, the clock could be used in the calculation of local atomic time and the steering of local universal time. In this paper, a new forecast model called dynamic grey-liner auto-regressive model is studied, and the precision of the new model is given. By the real data of National Time Service Center, the new model is tested.
ERIC Educational Resources Information Center
Fong, Duncan K. H.; Ebbes, Peter; DeSarbo, Wayne S.
2012-01-01
Multiple regression is frequently used across the various social sciences to analyze cross-sectional data. However, it can often times be challenging to justify the assumption of common regression coefficients across all respondents. This manuscript presents a heterogeneous Bayesian regression model that enables the estimation of…
Multiple Regression (MR) and Artificial Neural Network (ANN) models for prediction of soil suction
NASA Astrophysics Data System (ADS)
Erzin, Yusuf; Yilmaz, Isik
2010-05-01
This article presents a comparison of multiple regression (MR) and artificial neural network (ANN) model for prediction of soil suction of clayey soils. The results of the soil suction tests utilizing thermocouple psychrometers on statically compacted specimens of Bentonite-Kaolinite clay mixtures with varying soil properties were used to develope the models. The results obtained from both models were then compared with the experimental results. The performance indices such as coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and variance account for (VAF) were used to control the performance of the prediction capacity of the models developed in this study. ANN model has shown higher prediction performance than regression model according to the performance indices. It is shown that ANN models provide significant improvements in prediction accuracy over statistical models. The potential benefits of soft computing models extend beyond the high computation rates. Higher performances of the soft computing models were sourced from greater degree of robustness and fault tolerance than traditional statistical models because there are many more processing neurons, each with primarily local connections. It appears that there is a possibility of estimating soil suction by using the proposed empirical relationships and soft computing models. The population of the analyzed data is relatively limited in this study. Therefore, the practical outcome of the proposed equations and models could be used, with acceptable accuracy.
Positive Psychology in the Personal Adjustment Course: A Salutogenic Model.
ERIC Educational Resources Information Center
Hymel, Glenn M.; Etherton, Joseph L.
This paper proposes embedding various positive psychology themes in the context of an undergraduate course on the psychology of personal adjustment. The specific positive psychology constructs considered include those of hope, optimism, perseverance, humility, forgiveness, and spirituality. These themes are related to appropriate course content…
Bellavia, Andrea; Discacciati, Andrea; Bottai, Matteo; Wolk, Alicja; Orsini, Nicola
2015-08-01
Increasingly often in epidemiologic research, associations between survival time and predictors of interest are measured by differences between distribution functions rather than hazard functions. For example, differences in percentiles of survival time, expressed in absolute time units (e.g., weeks), may complement the popular risk ratios, which are unitless measures. When analyzing time to an event of interest (e.g., death) in prospective cohort studies, the time scale can be set to start at birth or at study entry. The advantages of one time origin over the other have been thoroughly explored for the estimation of risks but not for the estimation of survival percentiles. In this paper, we analyze the use of different time scales in the estimation of survival percentiles with Laplace regression. Using this regression method, investigators can estimate percentiles of survival time over levels of an exposure of interest while adjusting for potential confounders. Our findings may help to improve modeling strategies and ease interpretation in the estimation of survival percentiles in prospective cohort studies.
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that
Cross-validation pitfalls when selecting and assessing regression and classification models
2014-01-01
Background We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. Methods We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. Results We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. Conclusions We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error. PMID:24678909
Chen, Liang-Hsuan; Hsueh, Chan-Ching
2007-06-01
Fuzzy regression models are useful to investigate the relationship between explanatory and response variables with fuzzy observations. Different from previous studies, this correspondence proposes a mathematical programming method to construct a fuzzy regression model based on a distance criterion. The objective of the mathematical programming is to minimize the sum of distances between the estimated and observed responses on the X axis, such that the fuzzy regression model constructed has the minimal total estimation error in distance. Only several alpha-cuts of fuzzy observations are needed as inputs to the mathematical programming model; therefore, the applications are not restricted to triangular fuzzy numbers. Three examples, adopted in the previous studies, and a larger example, modified from the crisp case, are used to illustrate the performance of the proposed approach. The results indicate that the proposed model has better performance than those in the previous studies based on either distance criterion or Kim and Bishu's criterion. In addition, the efficiency and effectiveness for solving the larger example by the proposed model are also satisfactory.
Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression
NASA Astrophysics Data System (ADS)
Dineva, S.; Prodanova, K.; Mlachkova, D.
2013-12-01
The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC
Model-wise and point-wise random sample consensus for robust regression and outlier detection.
El-Melegy, Moumen T
2014-11-01
Popular regression techniques often suffer at the presence of data outliers. Most previous efforts to solve this problem have focused on using an estimation algorithm that minimizes a robust M-estimator based error criterion instead of the usual non-robust mean squared error. However the robustness gained from M-estimators is still low. This paper addresses robust regression and outlier detection in a random sample consensus (RANSAC) framework. It studies the classical RANSAC framework and highlights its model-wise nature for processing the data. Furthermore, it introduces for the first time a point-wise strategy of RANSAC. New estimation algorithms are developed following both the model-wise and point-wise RANSAC concepts. The proposed algorithms' theoretical robustness and breakdown points are investigated in a novel probabilistic setting. While the proposed concepts and algorithms are generic and general enough to adopt many regression machineries, the paper focuses on multilayered feed-forward neural networks in solving regression problems. The algorithms are evaluated on synthetic and real data, contaminated with high degrees of outliers, and compared to existing neural network training algorithms. Furthermore, to improve the time performance, parallel implementations of the two algorithms are developed and assessed to utilize the multiple CPU cores available on nowadays computers. PMID:25047916
A regression-kriging model for estimation of rainfall in the Laohahe basin
NASA Astrophysics Data System (ADS)
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal
2012-08-01
A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not.
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
NASA Astrophysics Data System (ADS)
Takeda, Katsunori; Hattori, Tetsuo; Kawano, Hiromichi
In real time analysis and forecasting of time series data, it is important to detect the structural change as immediately, correctly, and simply as possible. And it is necessary for rebuilding the next prediction model after the change point as soon as possible. For this kind of time series data analysis, in general, multiple linear regression models are used. In this paper, we present two methods, i.e., Sequential Probability Ratio Test (SPRT) and Chow Test that is well-known in economics, and describe those experimental evaluations of the effectiveness in the change detection using the multiple regression models. Moreover, we extend the definition of the detected change point in the SPRT method, and show the improvement of the change detection accuracy.
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Zheng, Shimin; Rao, Uma; Bartolucci, Alfred A.; Singh, Karan P.
2011-01-01
Bartolucci et al.(2003) extended the distribution assumption from the normal (Lyles et al., 2000) to the elliptical contoured distribution (ECD) for random regression models used in analysis of longitudinal data accounting for both undetectable values and informative drop-outs. In this paper, the random regression models are constructed on the multivariate skew ECD. A real data set is used to illustrate that the skew ECDs can fit some unimodal continuous data better than the Gaussian distributions or more general continuous symmetric distributions when the symmetric distribution assumption is violated. Also, a simulation study is done for illustrating the model fitness from a variety of skew ECDs. The software we used is SAS/STAT, V. 9.13. PMID:21637734
Bentamapimod (JNK Inhibitor AS602801) Induces Regression of Endometriotic Lesions in Animal Models.
Palmer, Stephen S; Altan, Melis; Denis, Deborah; Tos, Enrico Gillio; Gotteland, Jean-Pierre; Osteen, Kevin G; Bruner-Tran, Kaylon L; Nataraja, Selvaraj G
2016-01-01
Endometriosis is an estrogen (ER)-dependent gynecological disease caused by the growth of endometrial tissue at extrauterine sites. Current endocrine therapies address the estrogenic aspect of disease and offer some relief from pain but are associated with significant side effects. Immune dysfunction is also widely believed to be an underlying contributor to the pathogenesis of this disease. This study evaluated an inhibitor of c-Jun N-terminal kinase, bentamapimod (AS602801), which interrupts immune pathways, in 2 rodent endometriosis models. Treatment of nude mice bearing xenografts biopsied from women with endometriosis (BWE) with 30 mg/kg AS602801 caused 29% regression of lesion. Medroxyprogesterone acetate (MPA) or progesterone (PR) alone did not cause regression of BWE lesions, but combining 10 mg/kg AS602801 with MPA caused 38% lesion regression. In human endometrial organ cultures (from healthy women), treatment with AS602801 or MPA reduced matrix metalloproteinase-3 (MMP-3) release into culture medium. In organ cultures established with BWE, PR or MPA failed to inhibit MMP-3 secretion, whereas AS602801 alone or MPA + AS602801 suppressed MMP-3 production. In an autologous rat endometriosis model, AS602801 caused 48% regression of lesions compared to GnRH antagonist Antide (84%). AS602801 reduced inflammatory cytokines in endometriotic lesions, while levels of cytokines in ipsilateral horns were unaffected. Furthermore, AS602801 enhanced natural killer cell activity, without apparent negative effects on uterus. These results indicate that bentamapimod induced regression of endometriotic lesions in endometriosis rodent animal models without suppressing ER action. c-Jun N-terminal kinase inhibition mediated a comprehensive reduction in cytokine secretion and moreover was able to overcome PR resistance. PMID:26335175
Predictive Regression Models of Monthly Seismic Energy Emissions Induced by Longwall Mining
NASA Astrophysics Data System (ADS)
Jakubowski, Jacek; Tajduś, Antoni
2014-10-01
This article presents the development and validation of predictive regression models of longwall mining-induced seismicity, based on observations in 63 longwalls, in 12 seams, in the Bielszowice colliery in the Upper Silesian Coal Basin, which took place between 1992 and 2012. A predicted variable is the logarithm of the monthly sum of seismic energy induced in a longwall area. The set of predictors include seven quantitative and qualitative variables describing some mining and geological conditions and earlier seismicity in longwalls. Two machine learning methods have been used to develop the models: boosted regression trees and neural networks. Two types of model validation have been applied: on a random validation sample and on a time-based validation sample. The set of a few selected variables enabled nonlinear regression models to be built which gave relatively small prediction errors, taking the complex and strongly stochastic nature of the phenomenon into account. The article presents both the models of periodic forecasting for the following month as well as long-term forecasting.
Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran
2015-01-01
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771) to 0.769 (95% CI: 0.761-0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.
NASA Astrophysics Data System (ADS)
Kannan, S.; Ghosh, Subimal
2013-03-01
Hydrologic impacts of global climate change are usually assessed by downscaling large-scale climate variables, simulated by general circulation models (GCMs), to local-scale hydrometeorological variables. Conventional multisite statistical downscaling techniques often fail to capture spatial dependence of rainfall amounts as well as hydrometeorological extremes. To overcome these limitations, a downscaling algorithm is proposed, which first simulates the rainfall state of an entire study area/river basin, from large-scale climate variables, with classification and regression trees, and then projects multisite rainfall amounts using a nonparametric kernel regression estimator, conditioned on the estimated rainfall state. The concept of a common rainfall state for the entire study area, using it as an input for projections of rainfall amount, is found to be advantageous in capturing the cross correlation between rainfalls at different downscaling locations. Temporal variability and extremities of rainfall are captured in downscaling with multivariate kernel regression. The proposed model is applied for downscaling daily monsoon precipitation at eight locations in the Mahanadi River basin of eastern India. The model performance is compared, with a recently developed conditional random field based as well as with established multisite downscaling models, and is found to be superior. Analysis of future rainfall scenarios, projected with the developed downscaling model, reveals considerable changes in rainfall intensity and dry and wet spell lengths, among other things, at different locations. An increasing trend of rainfall is projected for the lower (southern) Mahanadi River basin, and a decreasing trend is observed in the upper (northern) Mahanadi River basin.
A Stepwise Time Series Regression Procedure for Water Demand Model Identification
NASA Astrophysics Data System (ADS)
Miaou, Shaw-Pin
1990-09-01
Annual time series water demand has traditionally been studied through multiple linear regression analysis. Four associated model specification problems have long been recognized: (1) the length of the available time series data is relatively short, (2) a large set of candidate explanatory or "input" variables needs to be considered, (3) input variables can be highly correlated with each other (multicollinearity problem), and (4) model error series are often highly autocorrelated or even nonstationary. A step wise time series regression identification procedure is proposed to alleviate these problems. The proposed procedure adopts the sequential input variable selection concept of stepwise regression and the "three-step" time series model building strategy of Box and Jenkins. Autocorrelated model error is assumed to follow an autoregressive integrated moving average (ARIMA) process. The stepwise selection procedure begins with a univariate time series demand model with no input variables. Subsequently, input variables are selected and inserted into the equation one at a time until the last entered variable is found to be statistically insignificant. The order of insertion is determined by a statistical measure called between-variable partial correlation. This correlation measure is free from the contamination of serial autocorrelation. Three data sets from previous studies are employed to illustrate the proposed procedure. The results are then compared with those from their original studies.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods. PMID:23247520
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach
Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.
2005-01-01
A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.
Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks.
Richter, Philipp; Toledano-Ayala, Manuel
2015-01-01
Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996
Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks
Richter, Philipp; Toledano-Ayala, Manuel
2015-01-01
Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of
Lee, Saro
2004-08-01
For landslide susceptibility mapping, this study applied and verified a Bayesian probability model, a likelihood ratio and statistical model, and logistic regression to Janghung, Korea, using a Geographic Information System (GIS). Landslide locations were identified in the study area from interpretation of IRS satellite imagery and field surveys; and a spatial database was constructed from topographic maps, soil type, forest cover, geology and land cover. The factors that influence landslide occurrence, such as slope gradient, slope aspect, and curvature of topography, were calculated from the topographic database. Soil texture, material, drainage, and effective depth were extracted from the soil database, while forest type, diameter, and density were extracted from the forest database. Land cover was classified from Landsat TM satellite imagery using unsupervised classification. The likelihood ratio and logistic regression coefficient were overlaid to determine each factor's rating for landslide susceptibility mapping. Then the landslide susceptibility map was verified and compared with known landslide locations. The logistic regression model had higher prediction accuracy than the likelihood ratio model. The method can be used to reduce hazards associated with landslides and to land cover planning.
Cluster regression model and level fluctuation features of Van Lake, Turkey
NASA Astrophysics Data System (ADS)
Sen, Z.; Kadioglu, M.; Batur, E.
1999-02-01
Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.
Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang
2016-07-01
Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience. PMID:27243599
Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.
Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang
2016-07-01
Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience.
NASA Astrophysics Data System (ADS)
martin, manuel; Lacarce, Eva; Meersmans, Jeroen; Orton, Thomas; Saby, Nicolas; Paroissien, Jean-Baptiste; Jolivet, Claudy; Boulonne, Line; Arrouays, Dominique
2013-04-01
Soil organic carbon (SOC) plays a major role in the global carbon budget. It can act as a source or a sink of atmospheric carbon, thereby possibly influencing the course of climate change. Improving the tools that model the spatial distributions of SOC stocks at national scales is a priority, both for monitoring changes in SOC and as an input for global carbon cycles studies. In this paper, first, we considered several increasingly complex boosted regression trees (BRT), a convenient and efficient multiple regression model from the statistical learning field. Further, we considered and a robust geostatistical approach coupled to the BRT models. Testing the different approaches was performed on the dataset from the French Soil Monitoring Network, with a consistent cross-validation procedure. We showed that the BRT models, given its ease of use and its predictive performance, could be preferred to geostatistical models for SOC mapping at the national scale, and if possible be joined with geostatistical models. This conclusion is valid provided that care is exercised in model fitting and validating, that the dataset does not allow for modeling local spatial autocorrelations, as it is the case for many national systematic sampling schemes, and when good quality data about SOC drivers included in the models is available.
NASA Astrophysics Data System (ADS)
Magar, R. B.; Jothiprakash, V.
2011-12-01
In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.
Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W.; Mansmann, Ulrich
2014-01-01
Introduction Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. Results We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. Conclusions We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data. PMID:24658540
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.
Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis
NASA Astrophysics Data System (ADS)
Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae
The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.
Development and comparison of regression models for the uptake of metals into various field crops.
Novotná, Markéta; Mikeš, Ondřej; Komprdová, Klára
2015-12-01
Field crops represent one of the highest contributions to dietary metal exposure. The aim of this study was to develop specific regression models for the uptake of metals into various field crops and to compare the usability of other available models. We analysed samples of potato, hop, maize, barley, wheat, rape seed, and grass from 66 agricultural sites. The influence of measured soil concentrations and soil factors (pH, organic carbon, content of silt and clay) on the plant concentrations of Cd, Cr, Cu, Mo, Ni, Pb and Zn was evaluated. Bioconcentration factors (BCF) and plant-specific metal models (PSMM) developed from multivariate regressions were calculated. The explained variability of the models was from 19 to 64% and correlations between measured and predicted concentrations were between 0.43 and 0.90. The developed hop and rapeseed models are new in this field. Available models from literature showed inaccurate results, except for Cd; the modelling efficiency was mostly around zero. The use of interaction terms between parameters can significantly improve plant-specific models.
Development and comparison of regression models for the uptake of metals into various field crops.
Novotná, Markéta; Mikeš, Ondřej; Komprdová, Klára
2015-12-01
Field crops represent one of the highest contributions to dietary metal exposure. The aim of this study was to develop specific regression models for the uptake of metals into various field crops and to compare the usability of other available models. We analysed samples of potato, hop, maize, barley, wheat, rape seed, and grass from 66 agricultural sites. The influence of measured soil concentrations and soil factors (pH, organic carbon, content of silt and clay) on the plant concentrations of Cd, Cr, Cu, Mo, Ni, Pb and Zn was evaluated. Bioconcentration factors (BCF) and plant-specific metal models (PSMM) developed from multivariate regressions were calculated. The explained variability of the models was from 19 to 64% and correlations between measured and predicted concentrations were between 0.43 and 0.90. The developed hop and rapeseed models are new in this field. Available models from literature showed inaccurate results, except for Cd; the modelling efficiency was mostly around zero. The use of interaction terms between parameters can significantly improve plant-specific models. PMID:26448504
NASA Astrophysics Data System (ADS)
Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.
2016-09-01
Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.
Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall
NASA Astrophysics Data System (ADS)
Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik
2016-02-01
Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Climate change implications on maximum monthly stream flow in Cyprus using fuzzy regression models
NASA Astrophysics Data System (ADS)
Loukas, A.; Spiliotopoulos, M.
2010-09-01
Maximum stream flow data collected from Cyprus Water Development Department and outputs of global circulation models (General Circulation Models, GCM) are used in this study, to develop statistical downscaling techniques in order to investigate the impact of climate change on stream flow at Yermasoyia watershed, Cyprus. The Yermasoyia watershed is located in the southern side of mountain Troodos, northeast of Limassol city and it drains into Yermasoyia reservoir. The watershed area is about 157 km2 and its altitude ranges from 70 m up to 1400 m, above mean sea level. The watershed is constituted mainly by igneous rocks, degraded basalt and handholds. The mean annual precipitation is 638 mm while the mean annual flow is estimated in 22,5 millions m3. The reservoir water surface is 110 hectares and has maximum capacity of 13,6 million m3. Earlier studies have shown that the development of downscaling methodologies using multiple linear fuzzy regression models can give quite satisfactory results. In this study, the outputs of SRES A2 and SRES B2 scenarios of the second version of the Canadian Coupled Global Climate Model (CGCM2) are utilized. This model is based on the earlier CGCM1 (Flato et al. (2000), but with some improvements to address shortcomings identified in the first version. Fuzzy regression is used for the downscaling of maximum monthly stream flow. The methodology is validated by independent historical data and used for the estimation of future maximum stream flow time series. From the 30 years of observed data representing the current climate, the first 25 years (1968-1993) are considered for calibrating the downscaling model while the remaining 5 years (1994-1998) are used in order to validate the model. The model was first developed using the logarithm of observed maximum monthly streamflow as the depended variable and 36 output parameters of GCM as the candidate independent variables. Then, five (5) independent GCM parameters were selected, namely
Gad, R S; Parab, J S; Naik, G M
2010-11-01
Multivariate system spectroscopic model plays important role in understanding chemometrics of ensemble under study. Here in this manuscript we discuss various approaches of modeling of spectroscopic system and demonstrate how Lorentz oscillator can be used to model any general spectroscopic system. Chemometric studies require customized templates design for the corresponding variants participating in ensemble, which generates the characteristic matrix of the ensemble under study. The typical biological system that resembles human blood tissue consisting of five major constituents i.e., alanine, urea, lactate, glucose, ascorbate; has been tested on the model. The model was validated using three approaches, namely, root mean square error (RMSE) analysis in the range of ±5% confidence interval, clerk gird error plot, and RMSE versus percent noise level study. Also the model was tested across various template sizes (consisting of samples ranging from 10 up to 1000) to ascertain the validity of partial least squares regression. The model has potential in understanding the chemometrics of proteomics pathways.
NASA Astrophysics Data System (ADS)
Gad, R. S.; Parab, J. S.; Naik, G. M.
2010-11-01
Multivariate system spectroscopic model plays important role in understanding chemometrics of ensemble under study. Here in this manuscript we discuss various approaches of modeling of spectroscopic system and demonstrate how Lorentz oscillator can be used to model any general spectroscopic system. Chemometric studies require customized templates design for the corresponding variants participating in ensemble, which generates the characteristic matrix of the ensemble under study. The typical biological system that resembles human blood tissue consisting of five major constituents i.e., alanine, urea, lactate, glucose, ascorbate; has been tested on the model. The model was validated using three approaches, namely, root mean square error (RMSE) analysis in the range of ±5% confidence interval, clerk gird error plot, and RMSE versus percent noise level study. Also the model was tested across various template sizes (consisting of samples ranging from 10 up to 1000) to ascertain the validity of partial least squares regression. The model has potential in understanding the chemometrics of proteomics pathways.
Gulliver, John; de Hoogh, Kees; Hansell, Anna; Vienneau, Danielle
2013-07-16
Modeling historic air pollution exposures is often restricted by availability of monitored concentration data. We evaluated back-extrapolation of land use regression (LUR) models for annual mean NO2 concentrations in Great Britain for up to 18 years earlier. LUR variables were created in a geographic information system (GIS) using land cover and road network data summarized within buffers, site coordinates, and altitude. Four models were developed for 2009 and 2001 using 75% of monitoring sites (in different groupings) and evaluated on the remaining 25%. Variables selected were generally stable between models. Within year, hold-out validation yielded mean-squared-error-based R(2) (MSE-R(2)) (i.e., fit around the 1:1 line) values of 0.25-0.63 and 0.51-0.65 for 2001 and 2009, respectively. Back-extrapolation was conducted for 2009 and 2001 models to 1991 and for 2009 models to 2001, adjusting to the year using two background NO2 monitoring sites. Evaluation of back-extrapolated predictions used 100% of sites from an historic national NO2 diffusion tube network (n = 451) for 1991 and 70 independent sites from automatic monitoring in 2001. Values of MSE-R(2) for back-extrapolation to 1991 were 0.42-0.45 and 0.52-0.55 for 2001 and 2009 models, respectively, but model performance varied by region. Back-extrapolation of LUR models appears valid for exposure assessment for NO2 back to 1991 for Great Britain. PMID:23763440
A marginalized zero-inflated Poisson regression model with overall exposure effects.
Long, D Leann; Preisser, John S; Herring, Amy H; Golin, Carol E
2014-12-20
The zero-inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non-susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing-based safer sex counseling intervention, designed to reduce unprotected sexual act counts. PMID:25220537
Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward
2016-08-01
Monitoring street-level particulates is essential to air quality management but challenging in high-density Hong Kong due to limitations in local monitoring network and the complexities of street environment. By employing vehicle-based mobile measurements, land use regression (LUR) models were developed to estimate the spatial variation of PM2.5 and PM10 in the downtown area of Hong Kong. Sampling runs were conducted along routes measuring a total of 30 km during a selected measurement period of total 14 days. In total, 321 independent variables were examined to develop LUR models by using stepwise regression with PM2.5 and PM10 as dependent variables. Approximately, 10% increases in the model adjusted R(2) were achieved by integrating urban/building morphology as independent variables into the LUR models. Resultant LUR models show that the most decisive factors on street-level air quality in Hong Kong are frontal area index, an urban/building morphological parameter, and road network line density and traffic volume, two parameters of road traffic. The adjusted R(2) of the final LUR models of PM2.5 and PM10 are 0.633 and 0.707, respectively. These results indicate that urban morphology is more decisive to the street-level air quality in high-density cities than other cities. Air pollution hotspots were also identified based on the LUR mapping. PMID:27381187
Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward
2016-08-01
Monitoring street-level particulates is essential to air quality management but challenging in high-density Hong Kong due to limitations in local monitoring network and the complexities of street environment. By employing vehicle-based mobile measurements, land use regression (LUR) models were developed to estimate the spatial variation of PM2.5 and PM10 in the downtown area of Hong Kong. Sampling runs were conducted along routes measuring a total of 30 km during a selected measurement period of total 14 days. In total, 321 independent variables were examined to develop LUR models by using stepwise regression with PM2.5 and PM10 as dependent variables. Approximately, 10% increases in the model adjusted R(2) were achieved by integrating urban/building morphology as independent variables into the LUR models. Resultant LUR models show that the most decisive factors on street-level air quality in Hong Kong are frontal area index, an urban/building morphological parameter, and road network line density and traffic volume, two parameters of road traffic. The adjusted R(2) of the final LUR models of PM2.5 and PM10 are 0.633 and 0.707, respectively. These results indicate that urban morphology is more decisive to the street-level air quality in high-density cities than other cities. Air pollution hotspots were also identified based on the LUR mapping.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
NASA Astrophysics Data System (ADS)
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
Notes on power of normality tests of error terms in regression models
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.
Quantitative Regression Models for the Prediction of Chemical Properties by an Efficient Workflow.
Yin, Yongmin; Xu, Congying; Gu, Shikai; Li, Weihua; Liu, Guixia; Tang, Yun
2015-10-01
Rapid safety assessment is more and more needed for the increasing chemicals both in chemical industries and regulators around the world. The traditional experimental methods couldn't meet the current demand any more. With the development of the information technology and the growth of experimental data, in silico modeling has become a practical and rapid alternative for the assessment of chemical properties, especially for the toxicity prediction of organic chemicals. In this study, a quantitative regression workflow was built by KNIME to predict chemical properties. With this regression workflow, quantitative values of chemical properties can be obtained, which is different from the binary-classification model or multi-classification models that can only give qualitative results. To illustrate the usage of the workflow, two predictive models were constructed based on datasets of Tetrahymena pyriformis toxicity and Aqueous solubility. The qcv (2) and qtest (2) of 5-fold cross validation and external validation for both types of models were greater than 0.7, which implies that our models are robust and reliable, and the workflow is very convenient and efficient in prediction of various chemical properties. PMID:27490968
A national fine spatial scale land-use regression model for ozone.
Kerckhoffs, Jules; Wang, Meng; Meliefste, Kees; Malmqvist, Ebba; Fischer, Paul; Janssen, Nicole A H; Beelen, Rob; Hoek, Gerard
2015-07-01
Uncertainty about health effects of long-term ozone exposure remains. Land use regression (LUR) models have been used successfully for modeling fine scale spatial variation of primary pollutants but very limited for ozone. Our objective was to assess the feasibility of developing a national LUR model for ozone at a fine spatial scale. Ozone concentrations were measured with passive samplers at 90 locations across the Netherlands (19 regional background, 36 urban background, 35 traffic). All sites were measured simultaneously during four 2-weekly campaigns spread over the seasons. LUR models were developed for the summer average as the primary exposure and annual average using predictor variables obtained with Geographic Information Systems. Summer average ozone concentrations varied between 32 and 61 µg/m(3). Ozone concentrations at traffic sites were on average 9 µg/m(3) lower compared to regional background sites. Ozone correlated highly negatively with nitrogen dioxide and moderately with fine particles. A LUR model including small-scale traffic, large-scale address density, urban green and a region indicator explained 71% of the spatial variation in summer average ozone concentrations. Land use regression modeling is a promising method to assess ozone spatial variation, but the high correlation with NO2 limits application in epidemiology.
Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model
NASA Astrophysics Data System (ADS)
Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale
2016-09-01
There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.
Flexible regression model selection for survival probabilities: with application to AIDS.
DiRienzo, A Gregory
2009-12-01
Clinicians are often interested in the effect of covariates on survival probabilities at prespecified study times. Because different factors can be associated with the risk of short- and long-term failure, a flexible modeling strategy is pursued. Given a set of multiple candidate working models, an objective methodology is proposed that aims to construct consistent and asymptotically normal estimators of regression coefficients and average prediction error for each working model, that are free from the nuisance censoring variable. It requires the conditional distribution of censoring given covariates to be modeled. The model selection strategy uses stepup or stepdown multiple hypothesis testing procedures that control either the proportion of false positives or generalized familywise error rate when comparing models based on estimates of average prediction error. The context can actually be cast as a missing data problem, where augmented inverse probability weighted complete case estimators of regression coefficients and prediction error can be used (Tsiatis, 2006, Semiparametric Theory and Missing Data). A simulation study and an interesting analysis of a recent AIDS trial are provided. PMID:19173693
An hourly regression model for ultrafine particles in a near-highway urban area
Patton, Allison P.; Collins, Caitlin; Naumova, Elena N.; Zamore, Wig; Brugge, Doug; Durant, John L.
2015-01-01
Estimating ultrafine particle number concentrations (PNC) near highways for exposure assessment in chronic health studies requires models capable of capturing PNC spatial and temporal variations over the course of a full year. The objectives of this work were to describe the relationship between near-highway PNC and potential predictors, and to build and validate hourly log-linear regression models. PNC was measured near Interstate 93 (I-93) in Somerville, MA (USA) using a mobile monitoring platform driven for 234 hours on 43 days between August 2009 and September 2010. Compared to urban background, PNC levels were consistently elevated within 100–200 m of I-93, with gradients impacted by meteorological and traffic conditions. Temporal and spatial variables including wind speed and direction, temperature, highway traffic, and distance to I-93 and major roads contributed significantly to the full regression model. Cross-validated model R2 values ranged from 0.38–0.47, with higher values achieved (0.43–0.53) when short-duration PNC spikes were removed. The model predicts highest PNC near major roads and on cold days with low wind speeds. The model allows estimation of hourly ambient PNC at 20-m resolution in a near-highway neighborhood. PMID:24559198
NASA Astrophysics Data System (ADS)
Ko, K.; Cheong, B.; Koh, D.
2010-12-01
Groundwater has been used a main source to provide a drinking water in a rural area with no regional potable water supply system in Korea. More than 50 percent of rural area residents depend on groundwater as drinking water. Thus, research on predicting groundwater pollution for the sustainable groundwater usage and protection from potential pollutants was demanded. This study was carried out to know the vulnerability of groundwater nitrate contamination reflecting the effect of land use in Nonsan city of a representative rural area of South Korea. About 47% of the study area is occupied by cultivated land with high vulnerable area to groundwater nitrate contamination because it has higher nitrogen fertilizer input of 62.3 tons/km2 than that of country’s average of 44.0 tons/km2. The two vulnerability assessment methods, logistic regression and DRASTIC model, were tested and compared to know more suitable techniques for the assessment of groundwater nitrate contamination in Nonsan area. The groundwater quality data were acquired from the collection of analyses of 111 samples of small potable supply system in the study area. The analyzed values of nitrate were classified by land use such as resident, upland, paddy, and field area. One dependent and two independent variables were addressed for logistic regression analysis. One dependent variable was a binary categorical data with 0 or 1 whether or not nitrate exceeding thresholds of 1 through 10 mg/L. The independent variables were one continuous data of slope indicating topography and multiple categorical data of land use which are classified by resident, upland, paddy, and field area. The results of the Levene’s test and T-test for slope and land use were showed the significant difference of mean values among groups in 95% confidence level. From the logistic regression, we could know the negative correlation between slope and nitrate which was caused by the decrease of contaminants inputs into groundwater with
Design Sensitivity for a Subsonic Aircraft Predicted by Neural Network and Regression Models
NASA Technical Reports Server (NTRS)
Hopkins, Dale A.; Patnaik, Surya N.
2005-01-01
A preliminary methodology was obtained for the design optimization of a subsonic aircraft by coupling NASA Langley Research Center s Flight Optimization System (FLOPS) with NASA Glenn Research Center s design optimization testbed (COMETBOARDS with regression and neural network analysis approximators). The aircraft modeled can carry 200 passengers at a cruise speed of Mach 0.85 over a range of 2500 n mi and can operate on standard 6000-ft takeoff and landing runways. The design simulation was extended to evaluate the optimal airframe and engine parameters for the subsonic aircraft to operate on nonstandard runways. Regression and neural network approximators were used to examine aircraft operation on runways ranging in length from 4500 to 7500 ft.
On the impact of covariate measurement error on spatial regression modelling
Huque, Md Hamidul; Bondell, Howard; Ryan, Louise
2015-01-01
Summary Spatial regression models have grown in popularity in response to rapid advances in GIS (Geographic Information Systems) technology that allows epidemiologists to incorporate geographically indexed data into their studies. However, it turns out that there are some subtle pitfalls in the use of these models. We show that presence of covariate measurement error can lead to significant sensitivity of parameter estimation to the choice of spatial correlation structure. We quantify the effect of measurement error on parameter estimates, and then suggest two different ways to produce consistent estimates. We evaluate the methods through a simulation study. These methods are then applied to data on Ischemic Heart Disease (IHD). PMID:25729267
Comparing Spatial and Multilevel Regression Models for Binary Outcomes in Neighborhood Studies
Xu, Hongwei
2013-01-01
The standard multilevel regressions that are widely used in neighborhood research typically ignore potential between-neighborhood correlations due to underlying spatial processes, and hence produce inappropriate inferences about neighborhood effects. In contrast, spatial models make estimations and predictions across areas by explicitly modeling the spatial correlations among observations in different locations. A better understanding of the strengths and limitations of spatial models as compared to the standard multilevel model is needed to improve the research on neighborhood and spatial effects. This research systematically compares model estimations and predictions for binary outcomes between (distance- and lattice-based) spatial and the standard multilevel models in the presence of both within- and between-neighborhood correlations, through simulations. Results from simulation analysis reveal that the standard multilevel and spatial models produce similar estimates of fixed effects, but different estimates of random effects variances. Both the standard multilevel and pure spatial models tend to overestimate the corresponding random effects variances, compared to hybrid models when both non-spatial within neighborhood and spatial between-neighborhood effects exist. Spatial models also outperform the standard multilevel model by a narrow margin in case of fully out-of-sample predictions. Distance-based spatial models provide extra spatial information and have stronger predictive power than lattice-based models under certain circumstances. These merits of spatial modeling are exhibited in an empirical analysis of the child mortality data from 1880 Newark, New Jersey. PMID:25284905
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Harlim, John; Mahdi, Adam; Majda, Andrew J.
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.
Ribeiro, Manuel Castro; Sousa, António Jorge; Pereira, Maria João
2016-05-01
The geographical distribution of health outcomes is influenced by socio-economic and environmental factors operating on different spatial scales. Geographical variations in relationships can be revealed with semi-parametric Geographically Weighted Poisson Regression (sGWPR), a model that can combine both geographically varying and geographically constant parameters. To decide whether a parameter should vary geographically, two models are compared: one in which all parameters are allowed to vary geographically and one in which all except the parameter being evaluated are allowed to vary geographically. The model with the lower corrected Akaike Information Criterion (AICc) is selected. Delivering model selection exclusively according to the AICc might hide important details in spatial variations of associations. We propose assisting the decision by using a Linear Model of Coregionalization (LMC). Here we show how LMC can refine sGWPR on ecological associations between socio-economic and environmental variables and low birth weight outcomes in the west-north-central region of Portugal.
Analysis of time-dependent covariates in a regressive relative survival model.
Giorgi, Roch; Gouvernet, Joanny
2005-12-30
Relative survival is a method for assessing prognostic factors for disease-specific mortality. However, most relative survival models assume that the effect of covariate on disease-specific mortality is fixed-in-time, which may not hold in some studies and requires adapted modelling. We propose an extension of the Esteve et al. regressive relative survival model that uses the counting process approach to accommodate time-dependent effect of a predictor's on disease-specific mortality. This approach had shown its robustness, and the properties of the counting process give a simple and attractive computational solution to model time-dependent covariates. Our approach is illustrated with the data from the Stanford Heart Transplant Study and with data from a hospital-based study on invasive breast cancer. Advantages of modelling time-dependent covariates in relative survival analysis are discussed.
Selecting Spatial Scale of Covariates in Regression Models of Environmental Exposures
Grant, Lauren P.; Gennings, Chris; Wheeler, David C.
2015-01-01
Environmental factors or socioeconomic status variables used in regression models to explain environmental chemical exposures or health outcomes are often in practice modeled at the same buffer distance or spatial scale. In this paper, we present four model selection algorithms that select the best spatial scale for each buffer-based or area-level covariate. Contamination of drinking water by nitrate is a growing problem in agricultural areas of the United States, as ingested nitrate can lead to the endogenous formation of N-nitroso compounds, which are potent carcinogens. We applied our methods to model nitrate levels in private wells in Iowa. We found that environmental variables were selected at different spatial scales and that a model allowing spatial scale to vary across covariates provided the best goodness of fit. Our methods can be applied to investigate the association between environmental risk factors available at multiple spatial scales or buffer distances and measures of disease, including cancers. PMID:25983543
Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees
NASA Astrophysics Data System (ADS)
Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.
2015-12-01
In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.
Forecasting peak asthma admissions in London: an application of quantile regression models.
Soyiri, Ireneous N; Reidpath, Daniel D; Sarran, Christophe
2013-07-01
Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.
Forecasting peak asthma admissions in London: an application of quantile regression models
NASA Astrophysics Data System (ADS)
Soyiri, Ireneous N.; Reidpath, Daniel D.; Sarran, Christophe
2013-07-01
Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.
Inhibition and regression of tumors in hamster DMBA model following laser microvascular targeting
NASA Astrophysics Data System (ADS)
McMillan, Kathleen; Wang, Zhi; Shapshay, Stanley M.
1998-07-01
Vascular targeting is a recent approach to cancer therapy that aims at damaging tumor vasculature to induce tumor cell hypoxia and subsequent cell death. Squamous cell cancer arises in the superficial mucosal and cutaneous epithelial layers, and tumor microvasculature therefore may be particularly well suited for targeting by selective photothermolysis. An initial evaluation of the effect of selective eradication of microvasculature on tumor development was undertaken here using the chemically-induced hamster cheek pouch model and a 585 nm pulsed dye laser. In a first group of 6 hamsters, progression of premalignant mucosal lesions was compared between control and laser treatment groups, and laser-induced regression of established tumors was evaluated. In a second group of 12 hamsters, the number of laser treatments required to produce complete regression of tumors of the buccal mucosa was determined. The effect of the laser on tumors appearing on the skin in these animals was also investigated. These experiments showed that laser treatment inhibited tumor development and caused complete regression of established tumors 10 mm3 or smaller. Photothermal microvascular targeting may be useful in treating dyplasia and early tumors of the upper aerodigestive tract and skin, with fewer adverse sequelae than existing modalities.
Using the jackknife for estimation in log link Bernoulli regression models.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Arriaga, Alex; Sinha, Debajyoti; Gawande, Atul A
2015-02-10
Bernoulli (or binomial) regression using a generalized linear model with a log link function, where the exponentiated regression parameters have interpretation as relative risks, is often more appropriate than logistic regression for prospective studies with common outcomes. In particular, many researchers regard relative risks to be more intuitively interpretable than odds ratios. However, for the log link, when the outcome is very prevalent, the likelihood may not have a unique maximum. To circumvent this problem, a 'COPY method' has been proposed, which is equivalent to creating for each subject an additional observation with the same covariates except the response variable has the outcome values interchanged (1's changed to 0's and 0's changed to 1's). The original response is given weight close to 1, while the new observation is given a positive weight close to 0; this approach always leads to convergence of the maximum likelihood algorithm, except for problems with convergence due to multicollinearity among covariates. Even though this method produces a unique maximum, when the outcome is very prevalent, and/or the sample size is relatively small, the COPY method can yield biased estimates. Here, we propose using the jackknife as a bias-reduction approach for the COPY method. The proposed method is motivated by a study of patients undergoing colorectal cancer surgery.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research. PMID:23156922
Creating a non-linear total sediment load formula using polynomial best subset regression model
NASA Astrophysics Data System (ADS)
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
Comparing tests appear in model-check for normal regression with spatially correlated observations
NASA Astrophysics Data System (ADS)
Somayasa, Wayan; Wibawa, Gusti A.
2016-06-01
The problem of investigating the appropriateness of an assumed model in regression analysis was traditionally handled by means of F test under independent observations. In this work we propose a more modern method based on the so-called set-indexed partial sums processes of the least squares residuals of the observations. We consider throughout this work univariate and multivariate regression models with spatially correlated observations, which are frequently encountered in the statistical modelling in geosciences as well as in mining. The decision is drawn by performing asymptotic test of statistical hypothesis based on the Kolmogorov-Smirnov and Cramér-von Misses functionals of the processes. We compare the two tests by investigating the power functions of the test. The finite sample size behavior of the tests are studied by simulating the empirical probability of rejections of H 0. It is shown that for univariate model the KS test seems to be more powerful. Conversely the Cramér-von Mises test tends to be more powerful than the KS test in the multivariate case.
NASA Astrophysics Data System (ADS)
Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.
2015-09-01
One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.
NASA Astrophysics Data System (ADS)
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
NASA Astrophysics Data System (ADS)
Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali
2013-04-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.
NASA Astrophysics Data System (ADS)
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Martínez-Flórez, Guillermo; Bolfarine, Heleno; Gómez, Héctor W
2013-03-01
We develop regression models for limited and censored data based on the mixture between the log-power-normal and Bernoulli-type distributions. A likelihood-based approach is implemented for parameter estimation and a small-scale simulation study is conducted to evaluate parameter recovery, with emphasis on bias estimation. The main conclusion is that the approach is very much satisfactory for moderate and large sample sizes. A real data example, the safety and immunogenecity study of measles vaccine in Haiti, is presented to illustrate how different models can be used to fit this type of data. As shown, the asymmetric models considered seem to present the best fit for the data set under study, revealing significance of the explanatory variable sex, which is not found significant with the log-normal model.
Chuang, Chun-Ling; Chang, Peng-Chan; Lin, Rong-Ho
2011-10-01
As changes in the medical environment and policies on national health insurance coverage have triggered tremendous impacts on the business performance and financial management of medical institutions, effective management becomes increasingly crucial for hospitals to enhance competitiveness and to strive for sustainable development. The study accordingly aims at evaluating hospital operational efficiency for better resource allocation and cost effectiveness. Several data envelopment analysis (DEA)-based models were first compared, and the DEA-artificial neural network (ANN) model was identified as more capable than the DEA and DEA-assurance region (AR) models of measuring operational efficiency and recognizing the best-performing hospital. The classification and regression tree (CART) efficiency model was then utilized to extract rules for improving resource allocation of medical institutions. PMID:20878210
Modeling and Simulation of Road Traffic Noise Using Artificial Neural Network and Regression.
Honarmand, M; Mousavi, S M
2014-04-01
Modeling and simulation of noise pollution has been done in a large city, where the population is over 2 millions. Two models of artificial neural network and regression were developed to predict in-city road traffic noise pollution with using the data of noise measurements and vehicle counts at three points of the city for a period of 12 hours. The MATLAB and DATAFIT softwares were used for simulation. The predicted results of noise level were compared with the measured noise levels in three stations. The values of normalized bias, sum of squared errors, mean of squared errors, root mean of squared errors, and squared correlation coefficient calculated for each model show the results of two models are suitable, and the predictions of artificial neural network are closer to the experimental data.
Stone, Wesley W.; Gilliom, Robert J.
2012-01-01
Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.
Estimating riparian understory vegetation cover with beta regression and copula models
Eskelson, Bianca N.I.; Madsen, Lisa; Hagar, Joan C.; Temesgen, Hailemariam
2011-01-01
Understory vegetation communities are critical components of forest ecosystems. As a result, the importance of modeling understory vegetation characteristics in forested landscapes has become more apparent. Abundance measures such as shrub cover are bounded between 0 and 1, exhibit heteroscedastic error variance, and are often subject to spatial dependence. These distributional features tend to be ignored when shrub cover data are analyzed. The beta distribution has been used successfully to describe the frequency distribution of vegetation cover. Beta regression models ignoring spatial dependence (BR) and accounting for spatial dependence (BRdep) were used to estimate percent shrub cover as a function of topographic conditions and overstory vegetation structure in riparian zones in western Oregon. The BR models showed poor explanatory power (pseudo-R2 ≤ 0.34) but outperformed ordinary least-squares (OLS) and generalized least-squares (GLS) regression models with logit-transformed response in terms of mean square prediction error and absolute bias. We introduce a copula (COP) model that is based on the beta distribution and accounts for spatial dependence. A simulation study was designed to illustrate the effects of incorrectly assuming normality, equal variance, and spatial independence. It showed that BR, BRdep, and COP models provide unbiased parameter estimates, whereas OLS and GLS models result in slightly biased estimates for two of the three parameters. On the basis of the simulation study, 93–97% of the GLS, BRdep, and COP confidence intervals covered the true parameters, whereas OLS and BR only resulted in 84–88% coverage, which demonstrated the superiority of GLS, BRdep, and COP over OLS and BR models in providing standard errors for the parameter estimates in the presence of spatial dependence.
Application of regression and neural models to predict competitive swimming performance.
Maszczyk, Adam; Roczniok, Robert; Waśkiewicz, Zbigniew; Czuba, Miłosz; Mikołajec, Kazimierz; Zajac, Adam; Stanula, Arkadiusz
2012-04-01
This research problem was indirectly but closely connected with the optimization of an athlete-selection process, based on predictions viewed as determinants of future successes. The research project involved a group of 249 competitive swimmers (age 12 yr., SD = 0.5) who trained and competed for four years. Measures involving fitness (e.g., lung capacity), strength (e.g., standing long jump), swimming technique (turn, glide, distance per stroke cycle), anthropometric variables (e.g., hand and foot size), as well as specific swimming measures (speeds in particular distances), were used. The participants (n = 189) trained from May 2008 to May 2009, which involved five days of swimming workouts per week, and three additional 45-min. sessions devoted to measurements necessary for this study. In June 2009, data from two groups of 30 swimmers each (n = 60) were used to identify predictor variables. Models were then constructed from these variables to predict final swimming performance in the 50 meter and 800 meter crawl events. Nonlinear regression models and neural models were built for the dependent variable of sport results (performance at 50m and 800m). In May 2010, the swimmers' actual race times for these events were compared to the predictions created a year prior to the beginning of the experiment. Results for the nonlinear regression models and perceptron networks structured as 8-4-1 and 4-3-1 indicated that the neural models overall more accurately predicted final swimming performance from initial training, strength, fitness, and body measurements. Differences in the sum of absolute error values were 4:11.96 (n = 30 for 800m) and 20.39 (n = 30 for 50m), for models structured as 8-4-1 and 4-3-1, respectively, with the neural models being more accurate. It seems possible that such models can be used to predict future performance, as well as in the process of recruiting athletes for specific styles and distances in swimming. PMID:22755464
Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.
Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai
2011-01-01
Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs.
Model-Based Evaluation of Spontaneous Tumor Regression in Pilocytic Astrocytoma.
Buder, Thomas; Deutsch, Andreas; Klink, Barbara; Voss-Böhme, Anja
2015-12-01
Pilocytic astrocytoma (PA) is the most common brain tumor in children. This tumor is usually benign and has a good prognosis. Total resection is the treatment of choice and will cure the majority of patients. However, often only partial resection is possible due to the location of the tumor. In that case, spontaneous regression, regrowth, or progression to a more aggressive form have been observed. The dependency between the residual tumor size and spontaneous regression is not understood yet. Therefore, the prognosis is largely unpredictable and there is controversy regarding the management of patients for whom complete resection cannot be achieved. Strategies span from pure observation (wait and see) to combinations of surgery, adjuvant chemotherapy, and radiotherapy. Here, we introduce a mathematical model to investigate the growth and progression behavior of PA. In particular, we propose a Markov chain model incorporating cell proliferation and death as well as mutations. Our model analysis shows that the tumor behavior after partial resection is essentially determined by a risk coefficient γ, which can be deduced from epidemiological data about PA. Our results quantitatively predict the regression probability of a partially resected benign PA given the residual tumor size and lead to the hypothesis that this dependency is linear, implying that removing any amount of tumor mass will improve prognosis. This finding stands in contrast to diffuse malignant glioma where an extent of resection threshold has been experimentally shown, below which no benefit for survival is expected. These results have important implications for future therapeutic studies in PA that should include residual tumor volume as a prognostic factor. PMID:26658166
Turkson, Anthony Joe; Otchey, James Eric
2015-01-01
Introduction: Various psychosocial studies on health related lifestyles lay emphasis on the fact that the perception one has of himself as being at risk of HIV/AIDS infection was a necessary condition for preventive behaviors to be adopted. Hierarchical Multiple Regression models was used to examine the relationship between eight independent variables and one dependent variable to isolate predictors which have significant influence on behavior and sexual practices. Methods: A Cross-sectional design was used for the study. Structured close-ended interviewer-administered questionnaire was used to collect primary data. Multistage stratified technique was used to sample views from 380 students from Takoradi Polytechnic, Ghana. A Hierarchical multiple regression model was used to ascertain the significance of certain predictors of sexual behavior and practices. Results: The variables that were extracted from the multiple regression were; for the constant; β=14.202, t=2.279, p=0.023, variable is significant; for the marital status; β=0.092, t=1.996, p<0.05, variable is significant; for the knowledge on AIDs; β= 0.090, t=1.996, p<0.05, variable is significant; for the attitude towards HIV/AIDs; β=0.486, t=10.575, p<0.001, variable is highly significant. Thus, the best fitting model for predicting behavior and sexual practices was a linear combination of the constant, one’s marital status, knowledge on HIV/AIDs and Attitude towards HIV/AIDs., Y (Behavior and sexual practices) = β0 + β1 (Marital status) + β2 (Knowledge on HIV AIDs issues) + β3 (Attitude towards HIV AIDs issues) β0, β1, β2 and β3 are respectively 14.201, 2.038, 0.148 and 0.486; the higher the better. Conclusions: Attitude and behavior change education on HIV/AIDs should be intensified in the institution so that students could adopt better lifestyles. PMID:25946917
Fulton, Barry A; Meyer, Joseph S
2014-08-01
The water effect ratio (WER) procedure developed by the US Environmental Protection Agency is commonly used to derive site-specific criteria for point-source metal discharges into perennial waters. However, experience is limited with this method in the ephemeral and intermittent systems typical of arid climates. The present study presents a regression model to develop WER-based site-specific criteria for a network of ephemeral and intermittent streams influenced by nonpoint sources of Cu in the southwestern United States. Acute (48-h) Cu toxicity tests were performed concurrently with Daphnia magna in site water samples and hardness-matched laboratory waters. Median effect concentrations (EC50s) for Cu in site water samples (n=17) varied by more than 12-fold, and the range of calculated WER values was similar. Statistically significant (α=0.05) univariate predictors of site-specific Cu toxicity included (in sequence of decreasing significance) dissolved organic carbon (DOC), hardness/alkalinity ratio, alkalinity, K, and total dissolved solids. A multiple-regression model developed from a combination of DOC and alkalinity explained 85% of the toxicity variability in site water samples, providing a strong predictive tool that can be used in the WER framework when site-specific criteria values are derived. The biotic ligand model (BLM) underpredicted toxicity in site waters by more than 2-fold. Adjustments to the default BLM parameters improved the model's performance but did not provide a better predictive tool compared with the regression model developed from DOC and alkalinity.
Wang, Bing; Shen, Hao; Fang, Aiqin; Huang, De-Shuang; Jiang, Changjun; Zhang, Jun; Chen, Peng
2016-06-17
Comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC/TOF-MS) system has become a key analytical technology in high-throughput analysis. Retention index has been approved to be helpful for compound identification in one-dimensional gas chromatography, which is also true for two-dimensional gas chromatography. In this work, a novel regression model was proposed for calculating the second dimension retention index of target components where n-alkanes were used as reference compounds. This model was developed to depict the relationship among adjusted second dimension retention time, temperature of the second dimension column and carbon number of n-alkanes by an exponential nonlinear function with only five parameters. Three different criteria were introduced to find the optimal values of parameters. The performance of this model was evaluated using experimental data of n-alkanes (C7-C31) at 24 temperatures which can cover all 0-6s adjusted retention time area. The experimental results show that the mean relative error between predicted adjusted retention time and experimental data of n-alkanes was only 2%. Furthermore, our proposed model demonstrates a good extrapolation capability for predicting adjusted retention time of target compounds which located out of the range of the reference compounds in the second dimension adjusted retention time space. Our work shows the deviation was less than 9 retention index units (iu) while the number of alkanes were added up to 5. The performance of our proposed model has also been demonstrated by analyzing a mixture of compounds in temperature programmed experiments. PMID:27208985
Regression Models for Aquifer Vulnerability to Nitrate Pollution in Osona (NE Spain)
NASA Astrophysics Data System (ADS)
Boy Roura, M.; Nolan, B. T.; Menció Domingo, A.; Mas-Pla, J.
2012-12-01
Regression models were developed at a local scale in the Osona region (1,260 square kilometers) to predict nitrate concentrations in groundwater. Osona is a semi-arid region in northeast Spain, where livestock and agricultural activities are very intensive, and therefore, it is vulnerable to nitrate pollution from agricultural sources (European Nitrate Directive (91/676/EEC)). Nitrate concentrations in groundwater are commonly above 50 mg/L as nitrate, reaching up to 500 mg/L in some of the sampled wells. Regression models were based on explanatory variables such as geology, land use, and nitrogen inputs, which control the fate, transport and attenuation of nitrate in groundwater. Regression has been widely used to determine aquifer vulnerability to nitrate in groundwater at large spatial scales. We developed models with and without site-specific groundwater chemistry data to see the extent to which the latter improved the models. Although chemistry data could explain additional variation in groundwater nitrate concentration, such data were available only at the well locations and therefore were less amenable for spatial extrapolation. The data set consisted of nitrate data from 63 sampled wells and the following explanatory variables: 1) soils data consisting of texture and other physical properties; 2) geology indicating presence or absence of aquifers in the region, and their type (unconfined, leaky or confined); 3) land use (agricultural, urban, forested); 4) nitrogen input as manure; 5) occurrence of irrigated crops; 6) estimates of nitrogen uptake developed for 10 different crops; 7) slope; 8) population density, and 9) groundwater chemistry data comprising major ions and trace elements. Variables 1 and 2 were compiled as point data because their polygons were much larger than the well buffers which represented contributing areas to the sampled wells. Variables 3 to 8 were compiled within a 500-meter radius buffer around wells using a GIS-based weighted
Optimal use of regression models in genome-wide association studies.
Powell, J E; Kranis, A; Floyd, J; Dekkers, J C M; Knott, S; Haley, C S
2012-04-01
The performance of linear regression models in genome-wide association studies is influenced by how marker information is parameterized in the model. Considering the impact of parameterization is especially important when using information from multiple markers to test for association. Properties of the population, such as linkage disequilibrium (LD) and allele frequencies, will also affect the ability of a model to provide statistical support for an underlying quantitative trait locus (QTL). Thus, for a given location in the genome, the relationship between population properties and model parameterization is expected to influence the performance of the model in providing evidence for the position of a QTL. As LD and allele frequencies vary throughout the genome and between populations, understanding the relationship between these properties and model parameterization is of considerable importance in order to make optimal use of available genomic data. Here, we evaluate the performance of regression-based association models using genotype and haplotype information across the full spectrum of allele frequency and LD scenarios. Genetic marker data from 200 broiler chickens were used to simulate genomic conditions by selecting individual markers to act as surrogate QTL (sQTL) and then investigating the ability of surrounding markers to estimate sQTL genotypes and provide statistical support for their location. The LD and allele frequencies of markers and sQTL are shown to have a strong effect on the performance of models relative to one another. Our results provide an indication of the best choice of model parameterization given certain scenarios of marker and QTL LD and allele frequencies. We demonstrate a clear advantage of haplotype-based models, which account for phase uncertainty over other models tested, particularly for QTL with low minor allele frequencies. We show that the greatest advantage of haplotype models over single-marker models occurs when LD between
Brakebill, J.W.; Preston, S.D.
2003-01-01
The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived
Non-linear regression model for spatial variation in precipitation chemistry for South India
NASA Astrophysics Data System (ADS)
Siva Soumya, B.; Sekhar, M.; Riotte, J.; Braun, Jean-Jacques
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO 3, NO 3 and Mg do not change much from coast to inland while, SO 4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent ( R2 ˜ 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of ˜5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India.
NASA Astrophysics Data System (ADS)
Melo, Raquel; Vieira, Gonçalo; Caselli, Alberto; Ramos, Miguel
2010-05-01
Field surveying during the austral summer of 2007/08 and the analysis of a QuickBird satellite image, resulted on the production of a detailed geomorphological map of the Irizar and Crater Lake area in Deception Island (South Shetlands, Maritime Antarctic - 1:10 000) and allowed its analysis and spatial modelling of the geomorphological phenomena. The present study focus on the analysis of the spatial distribution and characteristics of hummocky terrains, lag surfaces and nivation hollows, complemented by GIS spatial modelling intending to identify relevant controlling geographical factors. Models of the susceptibility of occurrence of these phenomena were created using two statistical methods: logistical regression, as a multivariate method; and the informative value as a bivariate method. Success and prediction rate curves were used for model validation. The Area Under the Curve (AUC) was used to quantify the level of performance and prediction of the models and to allow the comparison between the two methods. Regarding the logistic regression method, the AUC showed a success rate of 71% for the lag surfaces, 81% for the hummocky terrains and 78% for the nivation hollows. The prediction rate was 72%, 68% and 71%, respectively. Concerning the informative value method, the success rate was 69% for the lag surfaces, 84% for the hummocky terrains and 78% for the nivation hollows, and with a correspondingly prediction of 71%, 66% and 69%. The results were of very good quality and demonstrate the potential of the models to predict the influence of independent variables in the occurrence of the geomorphological phenomena and also the reliability of the data. Key-words: present-day geomorphological dynamics, detailed geomorphological mapping, GIS, spatial modelling, Deception Island, Antarctic.
NASA Astrophysics Data System (ADS)
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Hema, M; Srinivasan, K
2011-07-01
Nickel removal efficiency of powered activated carbons of coconut oilcake, neem oilcake and commercial carbon was investigated by using artificial neural network. The effective parameters for the removal of nickel (%R) by adsorption process, which included the pH, contact time (T), distinctiveness of activated carbon (Cn), amount of activated carbon (Cw) and initial concentration of nickel (Co) were investigated. Levenberg-Marquardt (LM) Back-propagation algorithm is used to train the network. The network topology was optimized by varying number of hidden layer and number of neurons in hidden layer. The model was developed in terms of training; validation and testing of experimental data, the test subsets that each of them contains 60%, 20% and 20% of total experimental data, respectively. Multiple regression equation was developed for nickel adsorption system and the output was compared with both simulated and experimental outputs. Standard deviation (SD) with respect to experimental output was quite higher in the case of regression model when compared with ANN model. The obtained experimental data best fitted with the artificial neural network. PMID:23029923
Ayuso, Mercedes; Bermúdez, Lluís; Santolino, Miguel
2016-04-01
The analysis of factors influencing the severity of the personal injuries suffered by victims of motor accidents is an issue of major interest. Yet, most of the extant literature has tended to address this question by focusing on either the severity of temporary disability or the severity of permanent injury. In this paper, a bivariate copula-based regression model for temporary disability and permanent injury severities is introduced for the joint analysis of the relationship with the set of factors that might influence both categories of injury. Using a motor insurance database with 21,361 observations, the copula-based regression model is shown to give a better performance than that of a model based on the assumption of independence. The inclusion of the dependence structure in the analysis has a higher impact on the variance estimates of the injury severities than it does on the point estimates. By taking into account the dependence between temporary and permanent severities a more extensive factor analysis can be conducted. We illustrate that the conditional distribution functions of injury severities may be estimated, thus, providing decision makers with valuable information.
A New Climate Adjustment Tool: An update to EPA’s Storm Water Management Model
The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations.
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan
2016-05-01
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders. PMID:26946026
Yang, Aileen; Hoek, Gerard; Montagne, Denise; Leseman, Daan L A C; Hellack, Bryan; Kuhlbusch, Thomas A J; Cassee, Flemming R; Brunekreef, Bert; Janssen, Nicole A H
2015-07-01
Oxidative potential (OP) of ambient particulate matter (PM) has been suggested as a health-relevant exposure metric. In order to use OP for exposure assessment, information is needed about how well central site OP measurements and modeled average OP at the home address reflect temporal and spatial variation of personal OP. We collected 96-hour personal, home outdoor and indoor PM2.5 samples from 15 volunteers living either at traffic, urban or regional background locations in Utrecht, the Netherlands. OP was also measured at one central reference site to account for temporal variations. OP was assessed using electron spin resonance (OP(ESR)) and dithiothreitol (OP(DTT)). Spatial variation of average OP at the home address was modeled using land use regression (LUR) models. For both OP(ESR) and OP(DTT), temporal correlations of central site measurements with home outdoor measurements were high (R>0.75), and moderate to high (R=0.49-0.70) with personal measurements. The LUR model predictions for OP correlated significantly with the home outdoor concentrations for OP(DTT) and OP(ESR) (R=0.65 and 0.62, respectively). LUR model predictions were moderately correlated with personal OP(DTT) measurements (R=0.50). Adjustment for indoor sources, such as vacuum cleaning and absence of fume-hood, improved the temporal and spatial agreement with measured personal exposure for OP(ESR). OP(DTT) was not associated with any indoor sources. Our study results support the use of central site OP for exposure assessment of epidemiological studies focusing on short-term health effects. PMID:25942578
Polynomial order selection in random regression models via penalizing adaptively the likelihood.
Corrales, J D; Munilla, S; Cantet, R J C
2015-08-01
Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates.
Daniels, Bryan C.; Nemenman, Ilya
2015-01-01
The nonlinearity of dynamics in systems biology makes it hard to infer them from experimental data. Simple linear models are computationally efficient, but cannot incorporate these important nonlinearities. An adaptive method based on the S-system formalism, which is a sensible representation of nonlinear mass-action kinetics typically found in cellular dynamics, maintains the efficiency of linear regression. We combine this approach with adaptive model selection to obtain efficient and parsimonious representations of cellular dynamics. The approach is tested by inferring the dynamics of yeast glycolysis from simulated data. With little computing time, it produces dynamical models with high predictive power and with structural complexity adapted to the difficulty of the inference problem. PMID:25806510
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes
Liverani, Silvia; Hastie, David I.; Azizi, Lamiae; Papathomas, Michail; Richardson, Sylvia
2016-01-01
PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection. PMID:27307779
Modeling and Simulation of Road Traffic Noise Using Artificial Neural Network and Regression.
Honarmand, M; Mousavi, S M
2014-01-01
The effect of traffic composition on the noise pollution has been investigated in a large city, where the population is over 2 millions. Noise measurements and vehicle counts were performed at three points of the city for a period of 12 hours. Two models of artificial neural network and regression were applied to predict in-city road traffic noise pollution. The MATLAB and DATAFIT softwares were used for simulation. The predicted results of noise level were compared with the measured noise levels in three stations. The values of normalized bias, standard squared error, mean-squared error, root-mean-squared error, and squared correlation coefficient calculated for each model showed that the results of two models are suitable, and the predictions of artificial neural network are closer to experimental data.
Using regression heteroscedasticity to model trends in the mean and variance of floods
NASA Astrophysics Data System (ADS)
Hecht, Jory; Vogel, Richard
2015-04-01
Changes in the frequency of extreme floods have been observed and anticipated in many hydrological settings in response to numerous drivers of environmental change, including climate, land cover, and infrastructure. To help decision-makers design flood control infrastructure in settings with non-stationary hydrological regimes, a parsimonious approach for detecting and modeling trends in extreme floods is needed. An approach using ordinary least squares (OLS) to fit a heteroscedastic regression model can accommodate nonstationarity in both the mean and variance of flood series while simultaneously offering a means of (i) analytically evaluating type I and type II trend detection errors, (ii) analytically generating expressions of uncertainty, such as confidence and prediction intervals, (iii) providing updated estimates of the frequency of floods exceeding the flood of record, (iv) accommodating a wide range of non-linear functions through ladder of powers transformations, and (v) communicating hydrological changes in a single graphical image. Previous research has shown that the two-parameter lognormal distribution can adequately model the annual maximum flood distribution of both stationary and non-stationary hydrological regimes in many regions of the United States. A simple logarithmic transformation of annual maximum flood series enables an OLS heteroscedastic regression modeling approach to be especially suitable for creating a non-stationary flood frequency distribution with parameters that are conditional upon time or physically meaningful covariates. While heteroscedasticity is often viewed as an impediment, we document how detecting and modeling heteroscedasticity presents an opportunity for characterizing both the conditional mean and variance of annual maximum floods. We introduce an approach through which variance trend models can be analytically derived from the behavior of residuals of the conditional mean flood model. Through case studies of
Rodríguez-Ramilo, Silvia Teresa; García-Cortés, Luis Alberto; González-Recio, Óscar
2014-01-01
Genome-enhanced genotypic evaluations are becoming popular in several livestock species. For this purpose, the combination of the pedigree-based relationship matrix with a genomic similarities matrix between individuals is a common approach. However, the weight placed on each matrix has been so far established with ad hoc procedures, without formal estimation thereof. In addition, when using marker- and pedigree-based relationship matrices together, the resulting combined relationship matrix needs to be adjusted to the same scale in reference to the base population. This study proposes a semi-parametric Bayesian method for combining marker- and pedigree-based information on genome-enabled predictions. A kernel matrix from a reproducing kernel Hilbert spaces regression model was used to combine genomic and genealogical information in a semi-parametric scenario, avoiding inversion and adjustment complications. In addition, the weights on marker- versus pedigree-based information were inferred from a Bayesian model with Markov chain Monte Carlo. The proposed method was assessed involving a large number of SNPs and a large reference population. Five phenotypes, including production and type traits of dairy cattle were evaluated. The reliability of the genome-based predictions was assessed using the correlation, regression coefficient and mean squared error between the predicted and observed values. The results indicated that when a larger weight was given to the pedigree-based relationship matrix the correlation coefficient was lower than in situations where more weight was given to genomic information. Importantly, the posterior means of the inferred weight were near the maximum of 1. The behavior of the regression coefficient and the mean squared error was similar to the performance of the correlation, that is, more weight to the genomic information provided a regression coefficient closer to one and a smaller mean squared error. Our results also indicated a greater
Burger, Divan Aristo; Schall, Robert
2015-01-01
Trials of the early bactericidal activity (EBA) of tuberculosis (TB) treatments assess the decline, during the first few days to weeks of treatment, in colony forming unit (CFU) count of Mycobacterium tuberculosis in the sputum of patients with smear-microscopy-positive pulmonary TB. Profiles over time of CFU data have conventionally been modeled using linear, bilinear, or bi-exponential regression. We propose a new biphasic nonlinear regression model for CFU data that comprises linear and bilinear regression models as special cases and is more flexible than bi-exponential regression models. A Bayesian nonlinear mixed-effects (NLME) regression model is fitted jointly to the data of all patients from a trial, and statistical inference about the mean EBA of TB treatments is based on the Bayesian NLME regression model. The posterior predictive distribution of relevant slope parameters of the Bayesian NLME regression model provides insight into the nature of the EBA of TB treatments; specifically, the posterior predictive distribution allows one to judge whether treatments are associated with monolinear or bilinear decline of log(CFU) count, and whether CFU count initially decreases fast, followed by a slower rate of decrease, or vice versa. PMID:25322214
Modeling Group Size and Scalar Stress by Logistic Regression from an Archaeological Perspective
Alberti, Gianmarco
2014-01-01
Johnson’s scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups’ size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson’s theory and on Dunbar’s findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122–132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147–170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion. PMID:24626241
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Alberti, Gianmarco
2014-01-01
Johnson's scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
[Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].
Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng
2016-02-15
With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Development of land use regression models for particle composition in twenty study areas in Europe.
de Hoogh, Kees; Wang, Meng; Adam, Martin; Badaloni, Chiara; Beelen, Rob; Birk, Matthias; Cesaroni, Giulia; Cirach, Marta; Declercq, Christophe; Dėdelė, Audrius; Dons, Evi; de Nazelle, Audrey; Eeftens, Marloes; Eriksen, Kirsten; Eriksson, Charlotta; Fischer, Paul; Gražulevičienė, Regina; Gryparis, Alexandros; Hoffmann, Barbara; Jerrett, Michael; Katsouyanni, Klea; Iakovides, Minas; Lanki, Timo; Lindley, Sarah; Madsen, Christian; Mölter, Anna; Mosler, Gioia; Nádor, Gizella; Nieuwenhuijsen, Mark; Pershagen, Göran; Peters, Annette; Phuleria, Harisch; Probst-Hensch, Nicole; Raaschou-Nielsen, Ole; Quass, Ulrich; Ranzi, Andrea; Stephanou, Euripides; Sugiri, Dorothea; Schwarze, Per; Tsai, Ming-Yi; Yli-Tuomi, Tarja; Varró, Mihály J; Vienneau, Danielle; Weinmayr, Gudrun; Brunekreef, Bert; Hoek, Gerard
2013-06-01
Land Use Regression (LUR) models have been used to describe and model spatial variability of annual mean concentrations of traffic related pollutants such as nitrogen dioxide (NO2), nitrogen oxides (NOx) and particulate matter (PM). No models have yet been published of elemental composition. As part of the ESCAPE project, we measured the elemental composition in both the PM10 and PM2.5 fraction sizes at 20 sites in each of 20 study areas across Europe. LUR models for eight a priori selected elements (copper (Cu), iron (Fe), potassium (K), nickel (Ni), sulfur (S), silicon (Si), vanadium (V), and zinc (Zn)) were developed. Good models were developed for Cu, Fe, and Zn in both fractions (PM10 and PM2.5) explaining on average between 67 and 79% of the concentration variance (R(2)) with a large variability between areas. Traffic variables were the dominant predictors, reflecting nontailpipe emissions. Models for V and S in the PM10 and PM2.5 fractions and Si, Ni, and K in the PM10 fraction performed moderately with R(2) ranging from 50 to 61%. Si, NI, and K models for PM2.5 performed poorest with R(2) under 50%. The LUR models are used to estimate exposures to elemental composition in the health studies involved in ESCAPE.
[Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].
Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng
2016-02-15
With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment. PMID:27363125
Lillehammer, Marie; Odegård, Jørgen; Meuwissen, Theo H E
2009-03-19
The combination of a sire model and a random regression term describing genotype by environment interactions may lead to biased estimates of genetic variance components because of heterogeneous residual variance. In order to test different models, simulated data with genotype by environment interactions, and dairy cattle data assumed to contain such interactions, were analyzed. Two animal models were compared to four sire models. Models differed in their ability to handle heterogeneous variance from different sources. Including an individual effect with a (co)variance matrix restricted to three times the sire (co)variance matrix permitted the modeling of the additive genetic variance not covered by the sire effect. This made the ability of sire models to handle heterogeneous genetic variance approximately equivalent to that of animal models. When residual variance was heterogeneous, a different approach to account for the heterogeneity of variance was needed, for example when using dairy cattle data in order to prevent overestimation of genetic heterogeneity of variance. Including environmental classes can be used to account for heterogeneous residual variance.
A stochastic regression model for general trend analysis of longitudinal continuous data.
Chao, Wei-Hsiung; Chen, Su-Hua
2009-08-01
A predictive continuous time model is developed for continuous panel data to assess the effect of time-varying covariates on the general direction of the movement of a continuous response that fluctuates over time. This is accomplished by reparameterizing the infinitesimal mean of an Ornstein-Uhlenbeck processes in terms of its equilibrium mean and a drift parameter, which assesses the rate that the process reverts to its equilibrium mean. The equilibrium mean is modeled as a linear predictor of covariates. This model can be viewed as a continuous time first-order autoregressive regression model with time-varying lag effects of covariates and the response, which is more appropriate for unequally spaced panel data than its discrete time analog. Both maximum likelihood and quasi-likelihood approaches are considered for estimating the model parameters and their performances are compared through simulation studies. The simpler quasi-likelihood approach is suggested because it yields an estimator that is of high efficiency relative to the maximum likelihood estimator and it yields a variance estimator that is robust to the diffusion assumption of the model. To illustrate the proposed model, an application to diastolic blood pressure data from a follow-up study on cardiovascular diseases is presented. Missing observations are handled naturally with this model.
Modeling of an Adjustable Beam Solid State Light Project
NASA Technical Reports Server (NTRS)
Clark, Toni
2015-01-01
This proposal is for the development of a computational model of a prototype variable beam light source using optical modeling software, Zemax Optics Studio. The variable beam light source would be designed to generate flood, spot, and directional beam patterns, while maintaining the same average power usage. The optical model would demonstrate the possibility of such a light source and its ability to address several issues: commonality of design, human task variability, and light source design process improvements. An adaptive lighting solution that utilizes the same electronics footprint and power constraints while addressing variability of lighting needed for the range of exploration tasks can save costs and allow for the development of common avionics for lighting controls.
NASA Astrophysics Data System (ADS)
Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel
2012-09-01
Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.
Zero-inflated models for regression analysis of count data: a study of growth and development.
Cheung, Yin Bin
2002-05-30
Poisson regression is widely used in medical studies, and can be extended to negative binomial regression to allow for heterogeneity. When there is an excess number of zero counts, a useful approach is to used a mixture model with a proportion P of subjects not at risk, and a proportion of 1--P at-risk subjects who take on outcome values following a Poisson or negative binomial distribution. Covariate effects can be incorporated into both components of the models. In child assessment, fine motor development is often measured by test items that involve a process of imitation and a process of fine motor exercise. One such developmental milestone is 'building a tower of cubes'. This study analyses the impact of foetal growth and postnatal somatic growth on this milestone, operationalized as the number of cubes and measured around the age of 22 months. It is shown that the two aspects of early growth may have different implications for imitation and fine motor dexterity. The usual approach of recording and analysing the milestone as a binary outcome, such as whether the child can build a tower of three cubes, may leave out important information.
Combining regression analysis and air quality modelling to predict benzene concentration levels
NASA Astrophysics Data System (ADS)
Vlachokostas, Ch.; Achillas, Ch.; Chourdakis, E.; Moussiopoulos, N.
2011-05-01
State of the art epidemiological research has found consistent associations between traffic-related air pollution and various outcomes, such as respiratory symptoms and premature mortality. However, many urban areas are characterised by the absence of the necessary monitoring infrastructure, especially for benzene (C 6H 6), which is a known human carcinogen. The use of environmental statistics combined with air quality modelling can be of vital importance in order to assess air quality levels of traffic-related pollutants in an urban area in the case where there are no available measurements. This paper aims at developing and presenting a reliable approach, in order to forecast C 6H 6 levels in urban environments, demonstrated for Thessaloniki, Greece. Multiple stepwise regression analysis is used and a strong statistical relationship is detected between C 6H 6 and CO. The adopted regression model is validated in order to depict its applicability and representativeness. The presented results demonstrate that the adopted approach is capable of capturing C 6H 6 concentration trends and should be considered as complementary to air quality monitoring.
Modeling HTL of industrial workers using multiple regression and path analytic techniques.
Smith, C R; Seitz, M R; Borton, T E; Kleinstein, R N; Wilmoth, J N
1984-04-01
This study compared path analytic with multiple regression analyses of hearing threshold levels (HTLs) on 258 adult textile workers evenly divided into low- and high-noise exposure groups. Demographic variables common in HTL studies were examined, with the addition of iris color, as well as selected two-way interactions. Variables of interest were similarly distributed in both groups. The results indicated that (1) different statistical procedures can lead to different conclusions even with the same HTL data for the same Ss; (2) conflicting conclusions may be artifacts of the analytic methodologies employed for data analysis; (3) a well-formulated theory under which path analytic techniques are employed may clarify somewhat the way a variable affects HTL values through its correlational connections with other antecedent variables included in the theoretical model; (4) multicollinearity among independent variables on which HTL is regressed usually presents a problem in unraveling exactly how each variable influences noise-induced hearing loss; and (5) because of the contradictory nature of its direct and indirect effects on HTL, iris color provides little, if any, explanatory assistance for modeling HTL.
Villain, Jonathan; Lozano, Sylvain; Halm-Lemeille, Marie-Pierre; Durrieu, Gilles; Bureau, Ronan
2014-12-01
The potential of quantile regression (QR) and quantile support vector machine regression (QSVMR) was analyzed for the definitions of quantitative structure-activity relationship (QSAR) models associated with a diverse set of chemicals toward a particular endpoint. This study focused on a specific sensitive endpoint (acute toxicity to algae) for which even a narcosis QSAR model is not actually clear. An initial dataset including more than 401 ecotoxicological data for one species of algae (Selenastrum capricornutum) was defined. This set corresponds to a large sample of chemicals ranging from classical organic chemicals to pesticides. From this original data set, the selection of the different subsets was made in terms of the notion of toxic ratio (TR), a parameter based on the ratio between predicted and experimental values. The robustness of QR and QSVMR to outliers was clearly observed, thus demonstrating that this approach represents a major interest for QSAR associated with a diverse set of chemicals. We focused particularly on descriptors related to molecular surface properties.
Knight, Rodney R.; Gain, W. Scott; Wolfe, William J.
2011-01-01
Predictive equations were developed using stepbackward regression for 19 ecologically relevant streamflow characteristics grouped in five major classes (magnitude, ratio, frequency, variability, and date) for use in the Tennessee and Cumberland River watersheds. Basin characteristics explain 50 percent or more of the variation for 10 of the 19 equations. Independent variables identified through stepbackward regression were statistically significant in 81 of 304 coefficients tested across 19 models (⬚ < 0.0001) and represent four major groups: climate, physical landscape features, regional indicators, and land use. The most influential variables for determining hydrologic response were in the land-use and climate groups: daily temperature range, percent agricultural land use, and monthly mean precipitation. These three variables were major explanatory factors in 17, 15, and 13 models, respectively. The equations and independent datasets were used to explore the broad relation between basin properties and streamflow and its implications for the study of ecological flow requirements. Key results include a high degree of hydrologic variability among least disturbed Blue Ridge streams, similar hydrologic behavior for watersheds with widely varying degrees of forest cover, and distinct hydrologic profiles for streams in different geographic regions.
Error analysis of leaf area estimates made from allometric regression models
NASA Technical Reports Server (NTRS)
Feiveson, A. H.; Chhikara, R. S.
1986-01-01
Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).
NASA Astrophysics Data System (ADS)
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; Reeves, Geoffrey D.; Clilverd, Mark
2016-04-01
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the prediction of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). A path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current (Dst), AE, and wave activity.
Zhao, Lue Ping; Bolouri, Hamid
2016-04-01
Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015). PMID:26972839
A Pearson-type goodness-of-fit test for stationary and time-continuous Markov regression models.
Aguirre-Hernández, R; Farewell, V T
2002-07-15
Markov regression models describe the way in which a categorical response variable changes over time for subjects with different explanatory variables. Frequently it is difficult to measure the response variable on equally spaced discrete time intervals. Here we propose a Pearson-type goodness-of-fit test for stationary Markov regression models fitted to panel data. A parametric bootstrap algorithm is used to study the distribution of the test statistic. The proposed technique is applied to examine the fit of a Markov regression model used to identify markers for disease progression in psoriatic arthritis.
NASA Astrophysics Data System (ADS)
Heckmann, Tobias; Gegg, Katharina; Becht, Michael
2013-04-01
Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size
Genetic analyses of stillbirth in relation to litter size using random regression models.
Chen, C Y; Misztal, I; Tsuruta, S; Herring, W O; Holl, J; Culbertson, M
2010-12-01
Estimates of genetic parameters for number of stillborns (NSB) in relation to litter size (LS) were obtained with random regression models (RRM). Data were collected from 4 purebred Duroc nucleus farms between 2004 and 2008. Two data sets with 6,575 litters for the first parity (P1) and 6,259 litters for the second to fifth parity (P2-5) with a total of 8,217 and 5,066 animals in the pedigree were analyzed separately. Number of stillborns was studied as a trait on sow level. Fixed effects were contemporary groups (farm-year-season) and fixed cubic regression coefficients on LS with Legendre polynomials. Models for P2-5 included the fixed effect of parity. Random effects were additive genetic effects for both data sets with permanent environmental effects included for P2-5. Random effects modeled with Legendre polynomials (RRM-L), linear splines (RRM-S), and degree 0 B-splines (RRM-BS) with regressions on LS were used. For P1, the order of polynomial, the number of knots, and the number of intervals used for respective models were quadratic, 3, and 3, respectively. For P2-5, the same parameters were linear, 2, and 2, respectively. Heterogeneous residual variances were considered in the models. For P1, estimates of heritability were 12 to 15%, 5 to 6%, and 6 to 7% in LS 5, 9, and 13, respectively. For P2-5, estimates were 15 to 17%, 4 to 5%, and 4 to 6% in LS 6, 9, and 12, respectively. For P1, average estimates of genetic correlations between LS 5 to 9, 5 to 13, and 9 to 13 were 0.53, -0.29, and 0.65, respectively. For P2-5, same estimates averaged for RRM-L and RRM-S were 0.75, -0.21, and 0.50, respectively. For RRM-BS with 2 intervals, the correlation was 0.66 between LS 5 to 7 and 8 to 13. Parameters obtained by 3 RRM revealed the nonlinear relationship between additive genetic effect of NSB and the environmental deviation of LS. The negative correlations between the 2 extreme LS might possibly indicate different genetic bases on incidence of stillbirth.
NASA Astrophysics Data System (ADS)
Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.
2016-04-01
The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
A factor analysis-multiple regression model for source apportionment of suspended particulate matter
NASA Astrophysics Data System (ADS)
Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige
A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.
Wang, Xiaojing; Chen, Ming-Hui; Yan, Jun
2013-07-01
Cox models with time-varying coefficients offer great flexibility in capturing the temporal dynamics of covariate effects on event times, which could be hidden from a Cox proportional hazards model. Methodology development for varying coefficient Cox models, however, has been largely limited to right censored data; only limited work on interval censored data has been done. In most existing methods for varying coefficient models, analysts need to specify which covariate coefficients are time-varying and which are not at the time of fitting. We propose a dynamic Cox regression model for interval censored data in a Bayesian framework, where the coefficient curves are piecewise constant but the number of pieces and the jump points are covariate specific and estimated from the data. The model automatically determines the extent to which the temporal dynamics is needed for each covariate, resulting in smoother and more stable curve estimates. The posterior computation is carried out via an efficient reversible jump Markov chain Monte Carlo algorithm. Inference of each coefficient is based on an average of models with different number of pieces and jump points. A simulation study with three covariates, each with a coefficient of different degree in temporal dynamics, confirmed that the dynamic model is preferred to the existing time-varying model in terms of model comparison criteria through conditional predictive ordinate. When applied to a dental health data of children with age between 7 and 12 years, the dynamic model reveals that the relative risk of emergence of permanent tooth 24 between children with and without an infected primary predecessor is the highest at around age 7.5, and that it gradually reduces to one after age 11. These findings were not seen from the existing studies with Cox proportional hazards models. PMID:23389549
Development of a charge adjustment model for cardiac catheterization.
Brennan, Andrew; Gauvreau, Kimberlee; Connor, Jean; O'Connell, Cheryl; David, Sthuthi; Almodovar, Melvin; DiNardo, James; Banka, Puja; Mayer, John E; Marshall, Audrey C; Bergersen, Lisa
2015-02-01
A methodology that would allow for comparison of charges across institutions has not been developed for catheterization in congenital heart disease. A single institution catheterization database with prospectively collected case characteristics was linked to hospital charges related and limited to an episode of care in the catheterization laboratory for fiscal years 2008-2010. Catheterization charge categories (CCC) were developed to group types of catheterization procedures using a combination of empiric data and expert consensus. A multivariable model with outcome charges was created using CCC and additional patient and procedural characteristics. In 3 fiscal years, 3,839 cases were available for analysis. Forty catheterization procedure types were categorized into 7 CCC yielding a grouper variable with an R (2) explanatory value of 72.6%. In the final CCC, the largest proportion of cases was in CCC 2 (34%), which included diagnostic cases without intervention. Biopsy cases were isolated in CCC 1 (12%), and percutaneous pulmonary valve placement alone made up CCC 7 (2%). The final model included CCC, number of interventions, and cardiac diagnosis (R (2) = 74.2%). Additionally, current financial metrics such as APR-DRG severity of illness and case mix index demonstrated a lack of correlation with CCC. We have developed a catheterization procedure type financial grouper that accounts for the diverse case population encountered in catheterization for congenital heart disease. CCC and our multivariable model could be used to understand financial characteristics of a population at a single point in time, longitudinally, and to compare populations.
Development of a charge adjustment model for cardiac catheterization.
Brennan, Andrew; Gauvreau, Kimberlee; Connor, Jean; O'Connell, Cheryl; David, Sthuthi; Almodovar, Melvin; DiNardo, James; Banka, Puja; Mayer, John E; Marshall, Audrey C; Bergersen, Lisa
2015-02-01
A methodology that would allow for comparison of charges across institutions has not been developed for catheterization in congenital heart disease. A single institution catheterization database with prospectively collected case characteristics was linked to hospital charges related and limited to an episode of care in the catheterization laboratory for fiscal years 2008-2010. Catheterization charge categories (CCC) were developed to group types of catheterization procedures using a combination of empiric data and expert consensus. A multivariable model with outcome charges was created using CCC and additional patient and procedural characteristics. In 3 fiscal years, 3,839 cases were available for analysis. Forty catheterization procedure types were categorized into 7 CCC yielding a grouper variable with an R (2) explanatory value of 72.6%. In the final CCC, the largest proportion of cases was in CCC 2 (34%), which included diagnostic cases without intervention. Biopsy cases were isolated in CCC 1 (12%), and percutaneous pulmonary valve placement alone made up CCC 7 (2%). The final model included CCC, number of interventions, and cardiac diagnosis (R (2) = 74.2%). Additionally, current financial metrics such as APR-DRG severity of illness and case mix index demonstrated a lack of correlation with CCC. We have developed a catheterization procedure type financial grouper that accounts for the diverse case population encountered in catheterization for congenital heart disease. CCC and our multivariable model could be used to understand financial characteristics of a population at a single point in time, longitudinally, and to compare populations. PMID:25113520
Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model.
Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping
2016-01-01
Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking. PMID:27498888
NASA Astrophysics Data System (ADS)
Chelariu, Romeu; Suditu, Gabriel Dan; Mareci, Daniel; Bolat, Georgiana; Cimpoesu, Nicanor; Leon, Florin; Curteanu, Silvia
2015-04-01
The aim of this study is to investigate the electrochemical behavior of some dental metallic materials in artificial saliva for different pH (5.6 and 3.4), NaF content (500 ppm, 1000 ppm, and 2000 ppm), and with albumin protein addition (0.6 wt.%) for pH 3.4. The corrosion resistance of the alloys was quantitatively evaluated by polarization resistance, estimated by electrochemical impedance spectroscopy method. An adaptive k-nearest-neighbor regression method was applied for evaluating the corrosion resistance of the alloys by simulation, depending on the operation conditions. The predictions provided by the model are useful for experimental practice, as they can replace or, at least, help to plan the experiments. The accurate results obtained prove that the developed model is reliable and efficient.
Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model
Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping
2016-01-01
Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking. PMID:27498888
A Bayesian approach to a logistic regression model with incomplete information.
Choi, Taeryon; Schervish, Mark J; Schmitt, Ketra A; Small, Mitchell J
2008-06-01
We consider a set of independent Bernoulli trials with possibly different success probabilities that depend on covariate values. However, the available data consist only of aggregate numbers of successes among subsets of the trials along with all of the covariate values. We still wish to estimate the parameters of a modeled relationship between the covariates and the success probabilities, e.g., a logistic regression model. In this article, estimation of the parameters is made from a Bayesian perspective by using a Markov chain Monte Carlo algorithm based only on the available data. The proposed methodology is applied to both simulation studies and real data from a dose-response study of a toxic chemical, perchlorate.
County level population estimation using knowledge-based image classification and regression models
NASA Astrophysics Data System (ADS)
Nepali, Anjeev
This paper presents methods and results of county-level population estimation using Landsat Thematic Mapper (TM) images of Denton County and Collin County in Texas. Landsat TM images acquired in March 2000 were classified into residential and non-residential classes using maximum likelihood classification and knowledge-based classification methods. Accuracy assessment results from the classified image produced using knowledge-based classification and traditional supervised classification (maximum likelihood classification) methods suggest that knowledge-based classification is more effective than traditional supervised classification methods. Furthermore, using randomly selected samples of census block groups, ordinary least squares (OLS) and geographically weighted regression (GWR) models were created for total population estimation. The overall accuracy of the models is over 96% at the county level. The results also suggest that underestimation normally occurs in block groups with high population density, whereas overestimation occurs in block groups with low population density.
Maniquiz, Marla C; Lee, Soyoung; Kim, Lee-Hyung
2010-01-01
Rainfall is an important factor in estimating the event mean concentration (EMC) which is used to quantify the washed-off pollutant concentrations from non-point sources (NPSs). Pollutant loads could also be calculated using rainfall, catchment area and runoff coefficient. In this study, runoff quantity and quality data gathered from a 28-month monitoring conducted on the road and parking lot sites in Korea were evaluated using multiple linear regression (MLR) to develop equations for estimating pollutant loads and EMCs as a function of rainfall variables. The results revealed that total event rainfall and average rainfall intensity are possible predictors of pollutant loads. Overall, the models are indicators of the high uncertainties of NPSs; perhaps estimation of EMCs and loads could be accurately obtained by means of water quality sampling or a long-term monitoring is needed to gather more data that can be used for the development of estimation models.
Wood, Jeffrey J.; Lynne, Sarah D.; Langer, David A.; Wood, Patricia A.; Clark, Shaunna L.; Eddy, J. Mark; Ialongo, Nicholas
2011-01-01
This study tests a model of reciprocal influences between absenteeism and youth psychopathology using three longitudinal datasets (Ns= 20745, 2311, and 671). Participants in 1st through 12th grades were interviewed annually or bi-annually. Measures of psychopathology include self-, parent-, and teacher-report questionnaires. Structural cross-lagged regression models were tested. In a nationally representative dataset (Add Health), middle school students with relatively greater absenteeism at study year 1 tended towards increased depression and conduct problems in study year 2, over and above the effects of autoregressive associations and demographic covariates. The opposite direction of effects was found for both middle and high school students. Analyses with two regionally representative datasets were also partially supportive. Longitudinal links were more evident in adolescence than in childhood. PMID:22188462
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
NASA Astrophysics Data System (ADS)
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
Caliari, Steven R; Perepelyuk, Maryna; Soulas, Elizabeth M; Lee, Gi Yun; Wells, Rebecca G; Burdick, Jason A
2016-06-13
The extracellular matrix (ECM) presents an evolving set of mechanical cues to resident cells. We developed methacrylated hyaluronic acid (MeHA) hydrogels containing both stable and hydrolytically degradable crosslinks to provide cells with a gradually softening (but not fully degradable) milieu, mimicking physiological events such as fibrosis regression. To demonstrate the utility of this cell culture system, we studied the phenotype of rat hepatic stellate cells, the major liver precursors of fibrogenic myofibroblasts, within this softening environment. Stellate cells that were mechanically primed on tissue culture plastic attained a myofibroblast phenotype, which persisted when seeded onto stiff (∼20 kPa) hydrogels. However, mechanically primed stellate cells on stiff-to-soft (∼20 to ∼3 kPa) hydrogels showed reversion of the myofibroblast phenotype over 14 days, with reductions in cell area, expression of the myofibroblast marker alpha-smooth muscle actin (α-SMA), and Yes-associated protein/Transcriptional coactivator with PDZ-binding motif (YAP/TAZ) nuclear localization when compared to stellate cells on stiff hydrogels. Cells on stiff-to-soft hydrogels did not fully revert, however. They displayed reduced expression of glial fibrillary acidic protein (GFAP), and underwent abnormally rapid re-activation to myofibroblasts in response to re-stiffening of the hydrogels through introduction of additional crosslinks. These features are typical of stellate cells with an intermediate phenotype, reported to occur in vivo with fibrosis regression and re-injury. Together, these data suggest that mechanics play an important role in fibrosis regression and that integrating dynamic mechanical cues into model systems helps capture cell behaviors observed in vivo.
Cuffney, T.F.; Kashuba, R.; Qian, S.S.; Alameddine, I.; Cha, Y.K.; Lee, B.; Coles, J.F.; McMahon, G.
2011-01-01
Multilevel hierarchical regression was used to examine regional patterns in the responses of benthic macroinvertebrates and algae to urbanization across 9 metropolitan areas of the conterminous USA. Linear regressions established that responses (intercepts and slopes) to urbanization of invertebrates and algae varied among metropolitan areas. Multilevel hierarchical regression models were able to explain these differences on the basis of region-scale predictors. Regional differences in the type of land cover (agriculture or forest) being converted to urban and climatic factors (precipitation and air temperature) accounted for the differences in the response of macroinvertebrates to urbanization based on ordination scores, total richness, Ephemeroptera, Plecoptera, Trichoptera richness, and average tolerance. Regional differences in climate and antecedent agriculture also accounted for differences in the responses of salt-tolerant diatoms, but differences in the responses of other diatom metrics (% eutraphenic, % sensitive, and % silt tolerant) were best explained by regional differences in soils (mean % clay soils). The effects of urbanization were most readily detected in regions where forest lands were being converted to urban land because agricultural development significantly degraded assemblages before urbanization and made detection of urban effects difficult. The effects of climatic factors (temperature, precipitation) on background conditions (biogeographic differences) and rates of response to urbanization were most apparent after accounting for the effects of agricultural development. The effects of climate and land cover on responses to urbanization provide strong evidence that monitoring, mitigation, and restoration efforts must be tailored for specific regions and that attainment goals (background conditions) may not be possible in regions with high levels of prior disturbance (e.g., agricultural development). ?? 2011 by The North American
A note on modeling of tumor regression for estimation of radiobiological parameters
Zhong, Hualiang Chetty, Indrin
2014-08-15
Purpose: Accurate calculation of radiobiological parameters is crucial to predicting radiation treatment response. Modeling differences may have a significant impact on derived parameters. In this study, the authors have integrated two existing models with kinetic differential equations to formulate a new tumor regression model for estimation of radiobiological parameters for individual patients. Methods: A system of differential equations that characterizes the birth-and-death process of tumor cells in radiation treatment was analytically solved. The solution of this system was used to construct an iterative model (Z-model). The model consists of three parameters: tumor doubling time T{sub d}, half-life of dead cells T{sub r}, and cell survival fraction SF{sub D} under dose D. The Jacobian determinant of this model was proposed as a constraint to optimize the three parameters for six head and neck cancer patients. The derived parameters were compared with those generated from the two existing models: Chvetsov's model (C-model) and Lim's model (L-model). The C-model and L-model were optimized with the parameter T{sub d} fixed. Results: With the Jacobian-constrained Z-model, the mean of the optimized cell survival fractions is 0.43 ± 0.08, and the half-life of dead cells averaged over the six patients is 17.5 ± 3.2 days. The parameters T{sub r} and SF{sub D} optimized with the Z-model differ by 1.2% and 20.3% from those optimized with the T{sub d}-fixed C-model, and by 32.1% and 112.3% from those optimized with the T{sub d}-fixed L-model, respectively. Conclusions: The Z-model was analytically constructed from the differential equations of cell populations that describe changes in the number of different tumor cells during the course of radiation treatment. The Jacobian constraints were proposed to optimize the three radiobiological parameters. The generated model and its optimization method may help develop high-quality treatment regimens for individual patients.
NASA Astrophysics Data System (ADS)
Li, Weixuan; Lin, Guang; Li, Bing
2016-09-01
Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.
NASA Astrophysics Data System (ADS)
Eghnam, Karam M.; Sheta, Alaa F.
2008-06-01
Development of accurate models is necessary in critical applications such as prediction. In this paper, a solution to the stock prediction problem of the Barents Sea capelin is introduced using Artificial Neural Network (ANN) and Multiple Linear model Regression (MLR) models. The Capelin stock in the Barents Sea is one of the largest in the world. It normally maintained a fishery with annual catches of up to 3 million tons. The Capelin stock problem has an impact in the fish stock development. The proposed prediction model was developed using an ANNs with their weights adapted using Genetic Algorithm (GA). The proposed model was compared to traditional linear model the MLR. The results showed that the ANN-GA model produced an overall accuracy of 21% better than the MLR model.
A land use regression model for ultrafine particles in Vancouver, Canada.
Abernethy, Rebecca C; Allen, Ryan W; McKendry, Ian G; Brauer, Michael
2013-05-21
Methods to characterize chronic exposure to ultrafine particles (UFP) can help to clarify potential health effects. Since UFP are not routinely monitored in North America, spatiotemporal models are one potential exposure assessment methodology. Portable condensation particle counters were used to measure particle number concentrations (PNC) to develop a land use regression (LUR) model. PNC, wind speed and direction were measured for sixty minutes at eighty locations during a two-week sampling campaign. We conducted continuous monitoring at four additional locations to assess temporal variation. LUR modeling utilized 135 potential geographic predictors including: road length, vehicle density, restaurant density, population density, land use and others. A novel approach incorporated meteorological data through wind roses as alternates to traditional circular buffers. The range of measured (sixty-minute median) PNC across locations varied seventy-fold (1500-105000 particles/cm(3), mean [SD] = 18200 [15900] particles/cm(3)). Correlations between PNC and concurrently measured two-week average NOX concentrations were 0.6-0.7. A PNC LUR model (R(2) = 0.48, leave-one-out cross validation R(2) = 0.32) including truck route length within 50 m, restaurant density within 200 m, and ln-distance to the port represents the first UFP LUR model in North America. Models incorporating wind roses did not explain more variability in measured PNC.
NASA Astrophysics Data System (ADS)
Ahangar-Asr, A.; Faramarzi, A.; Mottaghifard, N.; Javadi, A. A.
2011-11-01
This paper presents a new approach, based on evolutionary polynomial regression (EPR), for prediction of permeability ( K), maximum dry density (MDD), and optimum moisture content (OMC) as functions of some physical properties of soil. EPR is a data-driven method based on evolutionary computing aimed to search for polynomial structures representing a system. In this technique, a combination of the genetic algorithm (GA) and the least-squares method is used to find feasible structures and the appropriate parameters of those structures. EPR models are developed based on results from a series of classification, compaction, and permeability tests from the literature. The tests included standard Proctor tests, constant head permeability tests, and falling head permeability tests conducted on soils made of four components, bentonite, limestone dust, sand, and gravel, mixed in different proportions. The results of the EPR model predictions are compared with those of a neural network model, a correlation equation from the literature, and the experimental data. Comparison of the results shows that the proposed models are highly accurate and robust in predicting permeability and compaction characteristics of soils. Results from sensitivity analysis indicate that the models trained from experimental data have been able to capture many physical relationships between soil parameters. The proposed models are also able to represent the degree to which individual contributing parameters affect the maximum dry density, optimum moisture content, and permeability.
Nickerson, David M; Madsen, Brooks C
2005-06-01
Continuous monitoring of precipitation in East Central Florida has occurred since 1978 at a sampling site located on the University of Central Florida (UCF) campus. Monthly volume-weighted average (VWA) concentration for several major analytes that are present in precipitation samples was calculated from samples collected daily. Monthly VWA concentration and wet deposition of H(+), NH(4)(+), Ca(2+), Mg(2+), NO(3)(-), Cl(-) and SO(4)(2-) were evaluated by a nonlinear regression (NLR) model that considered 10-year data (from 1978 to 1987) and 20-year data (from 1978 to 1997). Little change in the NLR parameter estimates was indicated among the 10-year and 20-year evaluations except for general decreases in the predicted trends from the 10-year to the 20-year fits. Box-Jenkins autoregressive integrated moving average (ARIMA) models with linear trend were considered as an alternative to the NLR models for these data. The NLR and ARIMA model forecasts for 1998 were compared to the actual 1998 data. For monthly VWA concentration values, the two models gave similar results. For the wet deposition values, the ARIMA models performed considerably better.
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Regression Model-Based Walking Speed Estimation Using Wrist-Worn Inertial Sensor
Park, Edward J.
2016-01-01
Walking speed is widely used to study human health status. Wearable inertial measurement units (IMU) are promising tools for the ambulatory measurement of walking speed. Among wearable inertial sensors, the ones worn on the wrist, such as a watch or band, have relatively higher potential to be easily incorporated into daily lifestyle. Using the arm swing motion in walking, this paper proposes a regression model-based method for longitudinal walking speed estimation using a wrist-worn IMU. A novel kinematic variable is proposed, which finds the wrist acceleration in the principal axis (i.e. the direction of the arm swing). This variable (called pca-acc) is obtained by applying sensor fusion on IMU data to find the orientation followed by the use of principal component analysis. An experimental evaluation was performed on 15 healthy young subjects during free walking trials. The experimental results show that the use of the proposed pca-acc variable can significantly improve the walking speed estimation accuracy when compared to the use of raw acceleration information (p<0.01). When Gaussian process regression is used, the resulting walking speed estimation accuracy and precision is about 5.9% and 4.7%, respectively. PMID:27764231
Embryonic Gut Anomalies in a Mouse Model of Retinoic Acid-Induced Caudal Regression Syndrome
Pitera, Jolanta E.; Smith, Virpi V.; Woolf, Adrian S.; Milla, Peter J.
2001-01-01
Vitamin A and its derivatives such as retinoic acid (RA) are important signaling molecules for morphogenesis of vertebrate embryos. Little is known, however, about morphogenetic factors controlling the development of the gastrointestinal tract and RA is likely to be involved. In the mouse, teratogenic doses of RA cause truncation of the embryonic caudal body axis that parallel the caudal regression syndrome as described in humans. These changes are often associated with anomalies of the lower digestive tract. Overlapping spatiotemporal expression of retinoic acid receptor-β (RARβ) and cellular retinol-binding protein I, CRBPI, with Hoxb5 and c-ret in the gut mesoderm imply possible cooperation required for proper neuromuscular development. To determine susceptibility and responsiveness of the developing gut and its neuromusculature to exogenous retinoids we used a mouse model of RA-induced caudal regression syndrome. The results showed that stage-specific RA treatment both in vivo and in vitro affected gut looping/rotation morphogenesis and growth of asymmetrical structures such as the cecum together with delayed differentiation of the gut mesoderm and colonization of the postcecal gut by neural crest-derived enteric neuronal precursors. These observations demonstrate that RA has a direct effect on gut morphogenesis and innervation. PMID:11733381
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
Mathur, Praveen; Sharma, Sarita; Soni, Bhupendra
2010-01-01
In the present work, an attempt is made to formulate multiple regression equations using all possible regressions method for groundwater quality assessment of Ajmer-Pushkar railway line region in pre- and post-monsoon seasons. Correlation studies revealed the existence of linear relationships (r 0.7) for electrical conductivity (EC), total hardness (TH) and total dissolved solids (TDS) with other water quality parameters. The highest correlation was found between EC and TDS (r = 0.973). EC showed highly significant positive correlation with Na, K, Cl, TDS and total solids (TS). TH showed highest correlation with Ca and Mg. TDS showed significant correlation with Na, K, SO4, PO4 and Cl. The study indicated that most of the contamination present was water soluble or ionic in nature. Mg was present as MgCl2; K mainly as KCl and K2SO4, and Na was present as the salts of Cl, SO4 and PO4. On the other hand, F and NO3 showed no significant correlations. The r2 values and F values (at 95% confidence limit, alpha = 0.05) for the modelled equations indicated high degree of linearity among independent and dependent variables. Also the error % between calculated and experimental values was contained within +/- 15% limit.
A History of Regression and Related Model-Fitting in the Earth Sciences (1636?-2000)
Howarth, Richard J.
2001-12-15
its roots in meeting the evident need for improved estimators in spatial interpolation. Technical advances in regression analysis during the 1970s embraced the development of regression diagnostics and consequent attention to outliers; the recognition of problems caused by correlated predictors, and the subsequent introduction of ridge regression to overcome them; and techniques for fitting errors-in-variables and mixture models. Improvements in computational power have enabled ever more computer-intensive methods to be applied. These include algorithms which are robust in the presence of outliers, for example Rousseeuw's 1984 Least Median Squares; nonparametric smoothing methods, such as kernel-functions, splines and Cleveland's 1979 LOcally WEighted Scatterplot Smoother (LOWESS); and the Classification and Regression Tree (CART) technique of Breiman and others in 1984. Despite a continuing improvement in the rate of technology-transfer from the statistical to the earth-science community, despite an abrupt drop to a time-lag of about 10 years following the introduction of digital computers, these more recent developments are only just beginning to penetrate beyond the research community of earth scientists. Examples of applications to problem-solving in the earth sciences are given.
Spatial analysis studies have included application of land use regression models (LURs) for health and air quality assessments. Recent LUR studies have collected nitrogen dioxide (NO2) and volatile organic compounds (VOCs) using passive samplers at urban air monitoring networks ...
NASA Astrophysics Data System (ADS)
Shu, Yuqin; Lam, Nina S. N.
2011-01-01
Detailed estimates of carbon dioxide emissions at fine spatial scales are critical to both modelers and decision makers dealing with global warming and climate change. Globally, traffic-related emissions of carbon dioxide are growing rapidly. This paper presents a new method based on a multiple linear regression model to disaggregate traffic-related CO 2 emission estimates from the parish-level scale to a 1 × 1 km grid scale. Considering the allocation factors (population density, urban area, income, road density) together, we used a correlation and regression analysis to determine the relationship between these factors and traffic-related CO 2 emissions, and developed the best-fit model. The method was applied to downscale the traffic-related CO 2 emission values by parish (i.e. county) for the State of Louisiana into 1-km 2 grid cells. In the four highest parishes in traffic-related CO 2 emissions, the biggest area that has above average CO 2 emissions is found in East Baton Rouge, and the smallest area with no CO 2 emissions is also in East Baton Rouge, but Orleans has the most CO 2 emissions per unit area. The result reveals that high CO 2 emissions are concentrated in dense road network of urban areas with high population density and low CO 2 emissions are distributed in rural areas with low population density, sparse road network. The proposed method can be used to identify the emission "hot spots" at fine scale and is considered more accurate and less time-consuming than the previous methods.
Weichenthal, Scott; Van Ryswyk, Keith; Goldstein, Alon; Shekarrizfard, Maryam; Hatzopoulou, Marianne
2016-01-01
Exposure models are needed to evaluate the chronic health effects of ambient ultrafine particles (<0.1 μm) (UFPs). We developed a land use regression model for ambient UFPs in Toronto, Canada using mobile monitoring data collected during summer/winter 2010-2011. In total, 405 road segments were included in the analysis. The final model explained 67% of the spatial variation in mean UFPs and included terms for the logarithm of distances to highways, major roads, the central business district, Pearson airport, and bus routes as well as variables for the number of on-street trees, parks, open space, and the length of bus routes within a 100 m buffer. There was no systematic difference between measured and predicted values when the model was evaluated in an external dataset, although the R(2) value decreased (R(2) = 50%). This model will be used to evaluate the chronic health effects of UFPs using population-based cohorts in the Toronto area.
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios.
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios. PMID:26998784
Weichenthal, Scott; Van Ryswyk, Keith; Goldstein, Alon; Shekarrizfard, Maryam; Hatzopoulou, Marianne
2016-01-01
Exposure models are needed to evaluate the chronic health effects of ambient ultrafine particles (<0.1 μm) (UFPs). We developed a land use regression model for ambient UFPs in Toronto, Canada using mobile monitoring data collected during summer/winter 2010-2011. In total, 405 road segments were included in the analysis. The final model explained 67% of the spatial variation in mean UFPs and included terms for the logarithm of distances to highways, major roads, the central business district, Pearson airport, and bus routes as well as variables for the number of on-street trees, parks, open space, and the length of bus routes within a 100 m buffer. There was no systematic difference between measured and predicted values when the model was evaluated in an external dataset, although the R(2) value decreased (R(2) = 50%). This model will be used to evaluate the chronic health effects of UFPs using population-based cohorts in the Toronto area. PMID:25935348
Land-use regression panel models of NO2 concentrations in Seoul, Korea
NASA Astrophysics Data System (ADS)
Kim, Youngkook; Guldmann, Jean-Michel
2015-04-01
Transportation and land-use activities are major air pollution contributors. Since their shares of emissions vary across space and time, so do air pollution concentrations. Despite these variations, panel data have rarely been used in land-use regression (LUR) modeling of air pollution. In addition, the complex interactions between traffic flows, land uses, and meteorological variables, have not been satisfactorily investigated in LUR models. The purpose of this research is to develop and estimate nitrogen dioxide (NO2) panel models based on the LUR framework with data for Seoul, Korea, accounting for the impacts of these variables, and their interactions with spatial and temporal dummy variables. The panel data vary over several scales: daily (24 h), seasonally (4), and spatially (34 intra-urban measurement locations). To enhance model explanatory power, wind direction and distance decay effects are accounted for. The results show that vehicle-kilometers-traveled (VKT) and solar radiation have statistically strong positive and negative impacts on NO2 concentrations across the four seasonal models. In addition, there are significant interactions with the dummy variables, pointing to VKT and solar radiation effects on NO2 concentrations that vary with time and intra-urban location. The results also show that residential, commercial, and industrial land uses, and wind speed, temperature, and humidity, all impact NO2 concentrations. The R2 vary between 0.95 and 0.98.
Milly, P.C.D.; Dunne, K.A.
2011-01-01
Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median 211%) caused by the hydrologic model's apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen-Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors' findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climatechange impacts on water. Copyright ?? 2011, Paper 15-001; 35,952 words, 3 Figures, 0 Animations, 1 Tables.
Watts, Kenneth R.
1995-01-01
regression models. These models also include an autoregressive term to account for serial correlation in the residuals. The adjusted coefficient of determination (Ra2) for the 46 regression models range from 0.08 to 0.89, and the standard errors of estimate range from 0.034 to 2.483 feet. The regression models of monthly water- level change can be used to evaluate whether post-1985 monthly water-level change values at the selected observation wells are within the 95-percent confidence limits of predicted monthly water-level change.
Performance of Multi-City Land Use Regression Models for Nitrogen Dioxide and Fine Particles
Beelen, Rob; Bellander, Tom; Birk, Matthias; Cesaroni, Giulia; Cirach, Marta; Cyrys, Josef; de Hoogh, Kees; Declercq, Christophe; Dimakopoulou, Konstantina; Eeftens, Marloes; Eriksen, Kirsten T.; Forastiere, Francesco; Galassi, Claudia; Grivas, Georgios; Heinrich, Joachim; Hoffmann, Barbara; Ineichen, Alex; Korek, Michal; Lanki, Timo; Lindley, Sarah; Modig, Lars; Mölter, Anna; Nafstad, Per; Nieuwenhuijsen, Mark J.; Nystad, Wenche; Olsson, David; Raaschou-Nielsen, Ole; Ragettli, Martina; Ranzi, Andrea; Stempfelet, Morgane; Sugiri, Dorothea; Tsai, Ming-Yi; Udvardy, Orsolya; Varró, Mihaly J.; Vienneau, Danielle; Weinmayr, Gudrun; Wolf, Kathrin; Yli-Tuomi, Tarja; Hoek, Gerard; Brunekreef, Bert
2014-01-01
Background: Land use regression (LUR) models have been developed mostly to explain intraurban variations in air pollution based on often small local monitoring campaigns. Transferability of LUR models from city to city has been investigated, but little is known about the performance of models based on large numbers of monitoring sites covering a large area. Objectives: We aimed to develop European and regional LUR models and to examine their transferability to areas not used for model development. Methods: We evaluated LUR models for nitrogen dioxide (NO2) and particulate matter (PM; PM2.5, PM2.5 absorbance) by combining standardized measurement data from 17 (PM) and 23 (NO2) ESCAPE (European Study of Cohorts for Air Pollution Effects) study areas across 14 European countries for PM and NO2. Models were evaluated with cross-validation (CV) and hold-out validation (HV). We investigated the transferability of the models by successively excluding each study area from model building. Results: The European model explained 56% of the concentration variability across all sites for NO2, 86% for PM2.5, and 70% for PM2.5 absorbance. The HV R2s were only slightly lower than the model R2 (NO2, 54%; PM2.5, 80%; PM2.5 absorbance, 70%). The European NO2, PM2.5, and PM2.5 absorbance models explained a median of 59%, 48%, and 70% of within-area variability in individual areas. The transferred models predicted a modest-to-large fraction of variability in areas that were excluded from model building (median R2: NO2, 59%; PM2.5, 42%; PM2.5 absorbance, 67%). Conclusions: Using a large data set from 23 European study areas, we were able to develop LUR models for NO2 and PM metrics that predicted measurements made at independent sites and areas reasonably well. This finding is useful for assessing exposure in health studies conducted in areas where no measurements were conducted. Citation: Wang M, Beelen R, Bellander T, Birk M, Cesaroni G, Cirach M, Cyrys J, de Hoogh K, Declercq C
Statistical modelling for thoracic surgery using a nomogram based on logistic regression
Liu, Run-Zhong; Zhao, Ze-Rui
2016-01-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Statistical modelling for thoracic surgery using a nomogram based on logistic regression.
Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H
2016-08-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted. PMID:27621910
Statistical modelling for thoracic surgery using a nomogram based on logistic regression
Liu, Run-Zhong; Zhao, Ze-Rui
2016-01-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
Fienen, Michael N.; Selbig, William R.
2012-01-01
A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.
Giacomo, Della Riccia; Stefania, Del Zotto
2013-12-15
Fumonisins are mycotoxins produced by Fusarium species that commonly live in maize. Whereas fungi damage plants, fumonisins cause disease both to cattle breedings and human beings. Law limits set fumonisins tolerable daily intake with respect to several maize based feed and food. Chemical techniques assure the most reliable and accurate measurements, but they are expensive and time consuming. A method based on Near Infrared spectroscopy and multivariate statistical regression is described as a simpler, cheaper and faster alternative. We apply Partial Least Squares with full cross validation. Two models are described, having high correlation of calibration (0.995, 0.998) and of validation (0.908, 0.909), respectively. Description of observed phenomenon is accurate and overfitting is avoided. Screening of contaminated maize with respect to European legal limit of 4 mg kg(-1) should be assured.
Binary logistic regression modelling: Measuring the probability of relapse cases among drug addict
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Alias, Siti Nor Shadila
2014-07-01
For many years Malaysia faced the drug addiction issues. The most serious case is relapse phenomenon among treated drug addict (drug addict who have under gone the rehabilitation programme at Narcotic Addiction Rehabilitation Centre, PUSPEN). Thus, the main objective of this study is to find the most significant factor that contributes to relapse to happen. The binary logistic regression analysis was employed to model the relationship between independent variables (predictors) and dependent variable. The dependent variable is the status of the drug addict either relapse, (Yes coded as 1) or not, (No coded as 0). Meanwhile the predictors involved are age, age at first taking drug, family history, education level, family crisis, community support and self motivation. The total of the sample is 200 which the data are provided by AADK (National Antidrug Agency). The finding of the study revealed that age and self motivation are statistically significant towards the relapse cases..
Statistical modelling for thoracic surgery using a nomogram based on logistic regression.
Liu, Run-Zhong; Zhao, Ze-Rui; Ng, Calvin S H
2016-08-01
A well-developed clinical nomogram is a popular decision-tool, which can be used to predict the outcome of an individual, bringing benefits to both clinicians and patients. With just a few steps on a user-friendly interface, the approximate clinical outcome of patients can easily be estimated based on their clinical and laboratory characteristics. Therefore, nomograms have recently been developed to predict the different outcomes or even the survival rate at a specific time point for patients with different diseases. However, on the establishment and application of nomograms, there is still a lot of confusion that may mislead researchers. The objective of this paper is to provide a brief introduction on the history, definition, and application of nomograms and then to illustrate simple procedures to develop a nomogram with an example based on a multivariate logistic regression model in thoracic surgery. In addition, validation strategies and common pitfalls have been highlighted.
Milly, Paul C.D.; Dunne, Krista A.
2011-01-01
Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median -11%) caused by the hydrologic model’s apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen–Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors’ findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climate-change impacts on water.
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA
NASA Astrophysics Data System (ADS)
Zhao, Hong; Li, Changjun; Li, Hongping; Lv, Kebo; Zhao, Qinghui
2016-06-01
The sea surface salinity (SSS) is a key parameter in monitoring ocean states. Observing SSS can promote the understanding of global water cycle. This paper provides a new approach for retrieving sea surface salinity from Soil Moisture and Ocean Salinity (SMOS) satellite data. Based on the principal component regression (PCR) model, SSS can also be retrieved from the brightness temperature data of SMOS L2 measurements and Auxiliary data. 26 pair matchup data is used in model validation for the South China Sea (in the area of 4°-25°N, 105°-125°E). The RMSE value of PCR model retrieved SSS reaches 0.37 psu (practical salinity units) and the RMSE of SMOS SSS1 is 1.65 psu when compared with in-situ SSS. The corresponding Argo daily salinity data during April to June 2013 is also used in our validation with RMSE value 0.46 psu compared to 1.82 psu for daily averaged SMOS L2 products. This indicates that the PCR model is valid and may provide us with a good approach for retrieving SSS from SMOS satellite data.
NASA Astrophysics Data System (ADS)
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
Hu, Qinghua; Zhang, Shiguang; Xie, Zongxia; Mi, Jusheng; Wan, Jie
2014-09-01
Support vector regression (SVR) techniques are aimed at discovering a linear or nonlinear structure hidden in sample data. Most existing regression techniques take the assumption that the error distribution is Gaussian. However, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution, Laplacian distribution, or other models. In these cases the current regression techniques are not optimal. According to the Bayesian approach, we derive a general loss function and develop a technique of the uniform model of ν-support vector regression for the general noise model (N-SVR). The Augmented Lagrange Multiplier method is introduced to solve N-SVR. Numerical experiments on artificial data sets, UCI data and short-term wind speed prediction are conducted. The results show the effectiveness of the proposed technique.
Chernev, K; Isa, S; Bakalov, V; Aleksiev, Ch
1990-01-01
The multivariant approach offers best possibilities for assessment of liver function. The role of the different clinical, clinico-laboratory and combined clinical and clinicochemical indices in the prognosis of liver cirrhosis was studied in patient in ambulatory conditions. A step regressive mathematical model with the help of the program 2R of the statistical package BMDP was used. The regression of the clinical indices by 5 steps of the mathematical model showed that of greatest importance for the survival are the following indices: ascites, months since its onset, collaterals, anorexia and vascular nevi. By 4 steps of the regressive model of the clinico-chemical indices the following indices were chosen: prothrombin time, albumin, total bilirubin, cholesterol and alkaline phosphatase. The regression of the combined clinical and clinico-chemical indices pointed out as basic factors 3 clinical indices (ascites, months since its onset, collaterals) and 3 clinico-chemical indices related to the disturbed liver function (prothrombin time, total bilirubin, albumin).
NASA Astrophysics Data System (ADS)
Chu, A.; Zhuang, J.
2015-12-01
Wells and Coppersmith (1994) have used fault data to fit simple linear regression (SLR) models to explain linear relations between moment magnitude and logarithms of fault measurements such as rupture length, rupture width, rupture area and surface displacement. Our work extends their analyses to multiple linear regression (MLR) models by considering two or more predictors with updated data. Treating the quantitative variables (rupture length, rupture width, rupture area and surface displacement) as predictors to fit linear regression models on magnitude, we have discovered that the two-predictor model using rupture area and maximum displacement fits the best. The next best alternative predictors are surface length and rupture area. Neither slip type nor slip direction is a significant predictor by fitting of analysis of variance (ANOVA) and analysis of covariance (ANCOVA) models. Corrected Akaike information criterion (Burnham and Anderson, 2002) is used as a model assessment criterion. Comparisons between simple linear regression models of Wells and Coppersmith (1994) and our multiple linear regression models are presented. Our work is done using fault data from Wells and Coppersmith (1994) and new data from Ellswort (2000), Hanks and Bakun (2002, 2008), Shaw (2013), and Finite-Source Rupture Model Database (http://equake-rc.info/SRCMOD/, 2015).
Optimization of end-members used in multiple linear regression geochemical mixing models
NASA Astrophysics Data System (ADS)
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Richardson, David B; Laurier, Dominique; Schubauer-Berigan, Mary K; Tchetgen Tchetgen, Eric; Cole, Stephen R
2014-11-01
Workers' smoking histories are not measured in many occupational cohort studies. Here we discuss the use of negative control outcomes to detect and adjust for confounding in analyses that lack information on smoking. We clarify the assumptions necessary to detect confounding by smoking and the additional assumptions necessary to indirectly adjust for such bias. We illustrate these methods using data from 2 studies of radiation and lung cancer: the Colorado Plateau cohort study (1950-2005) of underground uranium miners (in which smoking was measured) and a French cohort study (1950-2004) of nuclear industry workers (in which smoking was unmeasured). A cause-specific relative hazards model is proposed for estimation of indirectly adjusted associations. Among the miners, the proposed method suggests no confounding by smoking of the association between radon and lung cancer--a conclusion supported by adjustment for measured smoking. Among the nuclear workers, the proposed method suggests substantial confounding by smoking of the association between radiation and lung cancer. Indirect adjustment for confounding by smoking resulted in an 18% decrease in the adjusted estimated hazard ratio, yet this cannot be verified because smoking was unmeasured. Assumptions underlying this method are described, and a cause-specific proportional hazards model that allows easy implementation using standard software is presented.
Richardson, David B.; Laurier, Dominique; Schubauer-Berigan, Mary K.; Tchetgen, Eric Tchetgen; Cole, Stephen R.
2014-01-01
Workers' smoking histories are not measured in many occupational cohort studies. Here we discuss the use of negative control outcomes to detect and adjust for confounding in analyses that lack information on smoking. We clarify the assumptions necessary to detect confounding by smoking and the additional assumptions necessary to indirectly adjust for such bias. We illustrate these methods using data from 2 studies of radiation and lung cancer: the Colorado Plateau cohort study (1950–2005) of underground uranium miners (in which smoking was measured) and a French cohort study (1950–2004) of nuclear industry workers (in which smoking was unmeasured). A cause-specific relative hazards model is proposed for estimation of indirectly adjusted associations. Among the miners, the proposed method suggests no confounding by smoking of the association between radon and lung cancer—a conclusion supported by adjustment for measured smoking. Among the nuclear workers, the proposed method suggests substantial confounding by smoking of the association between radiation and lung cancer. Indirect adjustment for confounding by smoking resulted in an 18% decrease in the adjusted estimated hazard ratio, yet this cannot be verified because smoking was unmeasured. Assumptions underlying this method are described, and a cause-specific proportional hazards model that allows easy implementation using standard software is presented. PMID:25245043
Evaluation of the Stress Adjustment and Adaptation Model among Families Reporting Economic Pressure
ERIC Educational Resources Information Center
Vandsburger, Etty; Biggerstaff, Marilyn A.
2004-01-01
This research evaluates the Stress Adjustment and Adaptation Model (double ABCX model) examining the effects resiliency resources on family functioning when families experience economic pressure. Families (N = 128) with incomes at or below the poverty line from a rural area of a southern state completed measures of perceived economic pressure,…
A Model of Divorce Adjustment for Use in Family Service Agencies.
ERIC Educational Resources Information Center
Faust, Ruth Griffith
1987-01-01
Presents a combined educationally and therapeutically oriented model of treatment to (1) control and lessen disruptive experiences associated with divorce; (2) enable individuals to improve their skill in coping with adjustment reactions to divorce; and (3) modify the pressures and response of single parenthood. Describes the model's four-session…
Modeling Quality-Adjusted Life Expectancy Loss Resulting from Tobacco Use in the United States
ERIC Educational Resources Information Center
Kaplan, Robert M.; Anderson, John P.; Kaplan, Cameron M.
2007-01-01
Purpose: To describe the development of a model for estimating the effects of tobacco use upon Quality Adjusted Life Years (QALYs) and to estimate the impact of tobacco use on health outcomes for the United States (US) population using the model. Method: We obtained estimates of tobacco consumption from 6 years of the National Health Interview…
Brandstätter, Christian; Laner, David; Prantl, Roman; Fellner, Johann
2014-12-01
Municipal solid waste landfills pose a threat on environment and human health, especially old landfills which lack facilities for collection and treatment of landfill gas and leachate. Consequently, missing information about emission flows prevent site-specific environmental risk assessments. To overcome this gap, the combination of waste sampling and analysis with statistical modeling is one option for estimating present and future emission potentials. Optimizing the tradeoff between investigation costs and reliable results requires knowledge about both: the number of samples to be taken and variables to be analyzed. This article aims to identify the optimized number of waste samples and variables in order to predict a larger set of variables. Therefore, we introduce a multivariate linear regression model and tested the applicability by usage of two case studies. Landfill A was used to set up and calibrate the model based on 50 waste samples and twelve variables. The calibrated model was applied to Landfill B including 36 waste samples and twelve variables with four predictor variables. The case study results are twofold: first, the reliable and accurate prediction of the twelve variables can be achieved with the knowledge of four predictor variables (Loi, EC, pH and Cl). For the second Landfill B, only ten full measurements would be needed for a reliable prediction of most response variables. The four predictor variables would exhibit comparably low analytical costs in comparison to the full set of measurements. This cost reduction could be used to increase the number of samples yielding an improved understanding of the spatial waste heterogeneity in landfills. Concluding, the future application of the developed model potentially improves the reliability of predicted emission potentials. The model could become a standard screening tool for old landfills if its applicability and reliability would be tested in additional case studies.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA.
Mair, Alan; El-Kadi, Aly I
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (>1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA
NASA Astrophysics Data System (ADS)
Mair, Alan; El-Kadi, Aly I.
2013-10-01
Capture zone analysis combined with a subjective susceptibility index is currently used in Hawaii to assess vulnerability to contamination of drinking water sources derived from groundwater. In this study, we developed an alternative objective approach that combines well capture zones with multiple-variable logistic regression (LR) modeling and applied it to the highly-utilized Pearl Harbor and Honolulu aquifers on the island of Oahu, Hawaii. Input for the LR models utilized explanatory variables based on hydrogeology, land use, and well geometry/location. A suite of 11 target contaminants detected in the region, including elevated nitrate (> 1 mg/L), four chlorinated solvents, four agricultural fumigants, and two pesticides, was used to develop the models. We then tested the ability of the new approach to accurately separate groups of wells with low and high vulnerability, and the suitability of nitrate as an indicator of other types of contamination. Our results produced contaminant-specific LR models that accurately identified groups of wells with the lowest/highest reported detections and the lowest/highest nitrate concentrations. Current and former agricultural land uses were identified as significant explanatory variables for eight of the 11 target contaminants, while elevated nitrate was a significant variable for five contaminants. The utility of the combined approach is contingent on the availability of hydrologic and chemical monitoring data for calibrating groundwater and LR models. Application of the approach using a reference site with sufficient data could help identify key variables in areas with similar hydrogeology and land use but limited data. In addition, elevated nitrate may also be a suitable indicator of groundwater contamination in areas with limited data. The objective LR modeling approach developed in this study is flexible enough to address a wide range of contaminants and represents a suitable addition to the current subjective approach.
Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.
Hansen, John H L; Williams, Keri; Bořil, Hynek
2015-08-01
Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.
Brakebill, J.W.; Wolock, D.M.; Terziotti, S.E.
2011-01-01
Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.
Brakebill, John W.; Wolock, David M.; Terziotti, Silvia
2011-01-01
Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based ⁄ statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling.
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
NASA Astrophysics Data System (ADS)
Faranda, Davide; Pons, Flavio Maria Emanuele; Dubrulle, Bérengère; Daviaud, François; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe
2014-10-01
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
NASA Astrophysics Data System (ADS)
Winahju, W. S.; Mukarromah, A.; Putri, S.
2015-03-01
Leprosy is a chronic infectious disease caused by bacteria of leprosy (Mycobacterium leprae). Leprosy has become an important thing in Indonesia because its morbidity is quite high. Based on WHO data in 2014, in 2012 Indonesia has the highest number of new leprosy patients after India and Brazil with a contribution of 18.994 people (8.7% of the world). This number makes Indonesia automatically placed as the country with the highest number of leprosy morbidity of ASEAN countries. The province that most contributes to the number of leprosy patients in Indonesia is East Java. There are two kind of leprosy. They consist of pausibacillary and multibacillary. The morbidity of multibacillary leprosy is higher than pausibacillary leprosy. This paper will discuss modeling both of the number of multibacillary and pausibacillary leprosy patients as responses variables. These responses are count variables, so modeling will be conducted by using bivariate poisson regression method. Unit experiment used is in East Java, and predictors involved are: environment, demography, and poverty. The model uses data in 2012, and the result indicates that all predictors influence significantly.
Kernel-based logistic regression model for protein sequence without vectorialization.
Fong, Youyi; Datta, Saheli; Georgiev, Ivelin S; Kwong, Peter D; Tomaras, Georgia D
2015-07-01
Protein sequence data arise more and more often in vaccine and infectious disease research. These types of data are discrete, high-dimensional, and complex. We propose to study the impact of protein sequences on binary outcomes using a kernel-based logistic regression model, which models the effect of protein through a random effect whose variance-covariance matrix is mostly determined by a kernel function. We propose a novel, biologically motivated, profile hidden Markov model (HMM)-based mutual information (MI) kernel. Hypothesis testing can be carried out using the maximum of the score statistics and a parametric bootstrap procedure. To improve the power of testing, we propose intuitive modifications to the test statistic. We show through simulation studies that the profile HMM-based MI kernel can be substantially more powerful than competing kernels, and that the modified test statistics bring incremental gains in power. We use these proposed methods to investigate two problems from HIV-1 vaccine research: (1) identifying segments of HIV-1 envelope (Env) protein that confer resistance to neutralizing antibody and (2) identifying segments of Env that are associated with attenuation of protective vaccine effect by antibodies of isotype A in the RV144 vaccine trial.
Brakebill, JW; Wolock, DM; Terziotti, SE
2011-01-01
Abstract Digital hydrologic networks depicting surface-water pathways and their associated drainage catchments provide a key component to hydrologic analysis and modeling. Collectively, they form common spatial units that can be used to frame the descriptions of aquatic and watershed processes. In addition, they provide the ability to simulate and route the movement of water and associated constituents throughout the landscape. Digital hydrologic networks have evolved from derivatives of mapping products to detailed, interconnected, spatially referenced networks of water pathways, drainage areas, and stream and watershed characteristics. These properties are important because they enhance the ability to spatially evaluate factors that affect the sources and transport of water-quality constituents at various scales. SPAtially Referenced Regressions On Watershed attributes (SPARROW), a process-based/statistical model, relies on a digital hydrologic network in order to establish relations between quantities of monitored contaminant flux, contaminant sources, and the associated physical characteristics affecting contaminant transport. Digital hydrologic networks modified from the River Reach File (RF1) and National Hydrography Dataset (NHD) geospatial datasets provided frameworks for SPARROW in six regions of the conterminous United States. In addition, characteristics of the modified RF1 were used to update estimates of mean-annual streamflow. This produced more current flow estimates for use in SPARROW modeling. PMID:22457575
Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models.
Hansen, John H L; Williams, Keri; Bořil, Hynek
2015-08-01
Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts. PMID:26328721
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Faranda, Davide Dubrulle, Bérengère; Daviaud, François; Pons, Flavio Maria Emanuele; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe
2014-10-15
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Canaza-Cayo, Ali William; Lopes, Paulo Sávio; da Silva, Marcos Vinicius Gualberto Barbosa; de Almeida Torres, Robledo; Martins, Marta Fonseca; Arbex, Wagner Antonio; Cobuci, Jaime Araujo
2015-01-01
A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre’s polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield (PSi) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre’s polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from −0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from −0.98 to 1.00, respectively. The use of PS7 would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits. PMID:26323397