Sample records for estimate regression parameters

  1. Application of nonlinear least-squares regression to ground-water flow modeling, west-central Florida

    USGS Publications Warehouse

    Yobbi, D.K.

    2000-01-01

    A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.

  2. Two-dimensional advective transport in ground-water flow parameter estimation

    USGS Publications Warehouse

    Anderman, E.R.; Hill, M.C.; Poeter, E.P.

    1996-01-01

    Nonlinear regression is useful in ground-water flow parameter estimation, but problems of parameter insensitivity and correlation often exist given commonly available hydraulic-head and head-dependent flow (for example, stream and lake gain or loss) observations. To address this problem, advective-transport observations are added to the ground-water flow, parameter-estimation model MODFLOWP using particle-tracking methods. The resulting model is used to investigate the importance of advective-transport observations relative to head-dependent flow observations when either or both are used in conjunction with hydraulic-head observations in a simulation of the sewage-discharge plume at Otis Air Force Base, Cape Cod, Massachusetts, USA. The analysis procedure for evaluating the probable effect of new observations on the regression results consists of two steps: (1) parameter sensitivities and correlations calculated at initial parameter values are used to assess the model parameterization and expected relative contributions of different types of observations to the regression; and (2) optimal parameter values are estimated by nonlinear regression and evaluated. In the Cape Cod parameter-estimation model, advective-transport observations did not significantly increase the overall parameter sensitivity; however: (1) inclusion of advective-transport observations decreased parameter correlation enough for more unique parameter values to be estimated by the regression; (2) realistic uncertainties in advective-transport observations had a small effect on parameter estimates relative to the precision with which the parameters were estimated; and (3) the regression results and sensitivity analysis provided insight into the dynamics of the ground-water flow system, especially the importance of accurate boundary conditions. In this work, advective-transport observations improved the calibration of the model and the estimation of ground-water flow parameters, and use of regression and related techniques produced significant insight into the physical system.

  3. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  4. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    NASA Astrophysics Data System (ADS)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  5. Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 2. Applications

    USGS Publications Warehouse

    Cooley, Richard L.

    1983-01-01

    This paper investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. First, if the parameters are properly scaled, linearized expressions for the mean square error (MSE) in parameter estimates of a nonlinear model will often behave very nearly as if the model were linear. Second, by using prior information, the MSE in properly scaled parameters can be reduced greatly over the MSE of ordinary least squares estimates of parameters. Third, plots of estimated MSE and the estimated standard deviation of MSE versus an auxiliary parameter (the ridge parameter) specifying the degree of influence of the prior information on regression results can help determine the potential for improvement of parameter estimates. Fourth, proposed criteria can be used to make appropriate choices for the ridge parameter and another parameter expressing degree of overall bias in the prior information. Results of a case study of Truckee Meadows, Reno-Sparks area, Washoe County, Nevada, conform closely to the results of the hypothetical problem. In the Truckee Meadows case, incorporation of prior information did not greatly change the parameter estimates from those obtained by ordinary least squares. However, the analysis showed that both sets of estimates are more reliable than suggested by the standard errors from ordinary least squares.

  6. Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 1. Theory

    USGS Publications Warehouse

    Cooley, Richard L.

    1982-01-01

    Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.

  7. Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression.

    PubMed

    Ding, A Adam; Wu, Hulin

    2014-10-01

    We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.

  8. Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression

    PubMed Central

    Ding, A. Adam; Wu, Hulin

    2015-01-01

    We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method. PMID:26401093

  9. A note on variance estimation in random effects meta-regression.

    PubMed

    Sidik, Kurex; Jonkman, Jeffrey N

    2005-01-01

    For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.

  10. Independent contrasts and PGLS regression estimators are equivalent.

    PubMed

    Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

    2012-05-01

    We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

  11. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices

    PubMed Central

    Westgate, Philip M.

    2016-01-01

    When generalized estimating equations (GEE) incorporate an unstructured working correlation matrix, the variances of regression parameter estimates can inflate due to the estimation of the correlation parameters. In previous work, an approximation for this inflation that results in a corrected version of the sandwich formula for the covariance matrix of regression parameter estimates was derived. Use of this correction for correlation structure selection also reduces the over-selection of the unstructured working correlation matrix. In this manuscript, we conduct a simulation study to demonstrate that an increase in variances of regression parameter estimates can occur when GEE incorporates structured working correlation matrices as well. Correspondingly, we show the ability of the corrected version of the sandwich formula to improve the validity of inference and correlation structure selection. We also study the relative influences of two popular corrections to a different source of bias in the empirical sandwich covariance estimator. PMID:27818539

  12. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices.

    PubMed

    Westgate, Philip M

    2016-01-01

    When generalized estimating equations (GEE) incorporate an unstructured working correlation matrix, the variances of regression parameter estimates can inflate due to the estimation of the correlation parameters. In previous work, an approximation for this inflation that results in a corrected version of the sandwich formula for the covariance matrix of regression parameter estimates was derived. Use of this correction for correlation structure selection also reduces the over-selection of the unstructured working correlation matrix. In this manuscript, we conduct a simulation study to demonstrate that an increase in variances of regression parameter estimates can occur when GEE incorporates structured working correlation matrices as well. Correspondingly, we show the ability of the corrected version of the sandwich formula to improve the validity of inference and correlation structure selection. We also study the relative influences of two popular corrections to a different source of bias in the empirical sandwich covariance estimator.

  13. Lateral-Directional Parameter Estimation on the X-48B Aircraft Using an Abstracted, Multi-Objective Effector Model

    NASA Technical Reports Server (NTRS)

    Ratnayake, Nalin A.; Waggoner, Erin R.; Taylor, Brian R.

    2011-01-01

    The problem of parameter estimation on hybrid-wing-body aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aerodynamic control effectors that act in coplanar motion. This adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of flight and simulation data must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, time-decorrelation techniques are applied to a model structure selected through stepwise regression for simulated and flight-generated lateral-directional parameter estimation data. A virtual effector model that uses mathematical abstractions to describe the multi-axis effects of clamshell surfaces is developed and applied. Comparisons are made between time history reconstructions and observed data in order to assess the accuracy of the regression model. The Cram r-Rao lower bounds of the estimated parameters are used to assess the uncertainty of the regression model relative to alternative models. Stepwise regression was found to be a useful technique for lateral-directional model design for hybrid-wing-body aircraft, as suggested by available flight data. Based on the results of this study, linear regression parameter estimation methods using abstracted effectors are expected to perform well for hybrid-wing-body aircraft properly equipped for the task.

  14. Comparison of Regression Analysis and Transfer Function in Estimating the Parameters of Central Pulse Waves from Brachial Pulse Wave.

    PubMed

    Chai, Rui; Xu, Li-Sheng; Yao, Yang; Hao, Li-Ling; Qi, Lin

    2017-01-01

    This study analyzed ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO), and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. Invasively measured parameters were compared with parameters measured from brachial pulse waves by regression model and transfer function model. Accuracy of parameters estimated by regression and transfer function model, was compared too. Findings showed that k value, central pulse wave and brachial pulse wave parameters invasively measured, correlated positively. Regression model parameters including A_slope, DBP, SEVR, and transfer function model parameters had good consistency with parameters invasively measured. They had same effect of consistency. SBP, PP, SV, and CO could be calculated through the regression model, but their accuracies were worse than that of transfer function model.

  15. Further comments on sensitivities, parameter estimation, and sampling design in one-dimensional analysis of solute transport in porous media

    USGS Publications Warehouse

    Knopman, Debra S.; Voss, Clifford I.

    1988-01-01

    Sensitivities of solute concentration to parameters associated with first-order chemical decay, boundary conditions, initial conditions, and multilayer transport are examined in one-dimensional analytical models of transient solute transport in porous media. A sensitivity is a change in solute concentration resulting from a change in a model parameter. Sensitivity analysis is important because minimum information required in regression on chemical data for the estimation of model parameters by regression is expressed in terms of sensitivities. Nonlinear regression models of solute transport were tested on sets of noiseless observations from known models that exceeded the minimum sensitivity information requirements. Results demonstrate that the regression models consistently converged to the correct parameters when the initial sets of parameter values substantially deviated from the correct parameters. On the basis of the sensitivity analysis, several statements may be made about design of sampling for parameter estimation for the models examined: (1) estimation of parameters associated with solute transport in the individual layers of a multilayer system is possible even when solute concentrations in the individual layers are mixed in an observation well; (2) when estimating parameters in a decaying upstream boundary condition, observations are best made late in the passage of the front near a time chosen by adding the inverse of an hypothesized value of the source decay parameter to the estimated mean travel time at a given downstream location; (3) estimation of a first-order chemical decay parameter requires observations to be made late in the passage of the front, preferably near a location corresponding to a travel time of √2 times the half-life of the solute; and (4) estimation of a parameter relating to spatial variability in an initial condition requires observations to be made early in time relative to passage of the solute front.

  16. a Comparison Between Two Ols-Based Approaches to Estimating Urban Multifractal Parameters

    NASA Astrophysics Data System (ADS)

    Huang, Lin-Shan; Chen, Yan-Guang

    Multifractal theory provides a new spatial analytical tool for urban studies, but many basic problems remain to be solved. Among various pending issues, the most significant one is how to obtain proper multifractal dimension spectrums. If an algorithm is improperly used, the parameter spectrums will be abnormal. This paper is devoted to investigating two ordinary least squares (OLS)-based approaches for estimating urban multifractal parameters. Using empirical study and comparative analysis, we demonstrate how to utilize the adequate linear regression to calculate multifractal parameters. The OLS regression analysis has two different approaches. One is that the intercept is fixed to zero, and the other is that the intercept is not limited. The results of comparative study show that the zero-intercept regression yields proper multifractal parameter spectrums within certain scale range of moment order, while the common regression method often leads to abnormal multifractal parameter values. A conclusion can be reached that fixing the intercept to zero is a more advisable regression method for multifractal parameters estimation, and the shapes of spectral curves and value ranges of fractal parameters can be employed to diagnose urban problems. This research is helpful for scientists to understand multifractal models and apply a more reasonable technique to multifractal parameter calculations.

  17. Detecting influential observations in nonlinear regression modeling of groundwater flow

    USGS Publications Warehouse

    Yager, Richard M.

    1998-01-01

    Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.

  18. Deletion Diagnostics for Alternating Logistic Regressions

    PubMed Central

    Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.

    2013-01-01

    Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960

  19. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  20. Estimation of octanol/water partition coefficients using LSER parameters

    USGS Publications Warehouse

    Luehrs, Dean C.; Hickey, James P.; Godbole, Kalpana A.; Rogers, Tony N.

    1998-01-01

    The logarithms of octanol/water partition coefficients, logKow, were regressed against the linear solvation energy relationship (LSER) parameters for a training set of 981 diverse organic chemicals. The standard deviation for logKow was 0.49. The regression equation was then used to estimate logKow for a test of 146 chemicals which included pesticides and other diverse polyfunctional compounds. Thus the octanol/water partition coefficient may be estimated by LSER parameters without elaborate software but only moderate accuracy should be expected.

  1. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    NASA Astrophysics Data System (ADS)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  2. Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

    NASA Astrophysics Data System (ADS)

    Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

    2017-06-01

    Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.

  3. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    NASA Astrophysics Data System (ADS)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  4. Regression analysis and transfer function in estimating the parameters of central pulse waves from brachial pulse wave.

    PubMed

    Chai Rui; Li Si-Man; Xu Li-Sheng; Yao Yang; Hao Li-Ling

    2017-07-01

    This study mainly analyzed the parameters such as ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO) and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. These parameters extracted from the central pulse wave invasively measured were compared with the parameters measured from the brachial pulse waves by a regression model and a transfer function model. The accuracy of the parameters which were estimated by the regression model and the transfer function model was compared too. Our findings showed that in addition to the k value, the above parameters of the central pulse wave and the brachial pulse wave invasively measured had positive correlation. Both the regression model parameters including A_slope, DBP, SEVR and the transfer function model parameters had good consistency with the parameters invasively measured, and they had the same effect of consistency. The regression equations of the three parameters were expressed by Y'=a+bx. The SBP, PP, SV, CO of central pulse wave could be calculated through the regression model, but their accuracies were worse than that of transfer function model.

  5. Heritability estimations for inner muscular fat in Hereford cattle using random regressions

    USDA-ARS?s Scientific Manuscript database

    Random regressions make possible to make genetic predictions and parameters estimation across a gradient of environments, allowing a more accurate and beneficial use of animals as breeders in specific environments. The objective of this study was to use random regression models to estimate heritabil...

  6. Estimating parameters for tree basal area growth with a system of equations and seemingly unrelated regressions

    Treesearch

    Charles E. Rose; Thomas B. Lynch

    2001-01-01

    A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...

  7. Quantum State Tomography via Linear Regression Estimation

    PubMed Central

    Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan

    2013-01-01

    A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519

  8. The effect of high leverage points on the logistic ridge regression estimator having multicollinearity

    NASA Astrophysics Data System (ADS)

    Ariffin, Syaiba Balqish; Midi, Habshah

    2014-06-01

    This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.

  9. A statistical methodology for estimating transport parameters: Theory and applications to one-dimensional advectivec-dispersive systems

    USGS Publications Warehouse

    Wagner, Brian J.; Gorelick, Steven M.

    1986-01-01

    A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.

  10. Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

    PubMed

    Demidenko, Eugene

    2017-09-01

    The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.

  11. Logistic regression for circular data

    NASA Astrophysics Data System (ADS)

    Al-Daffaie, Kadhem; Khan, Shahjahan

    2017-05-01

    This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.

  12. Weighted regression analysis and interval estimators

    Treesearch

    Donald W. Seegrist

    1974-01-01

    A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.

  13. Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data.

    PubMed

    Lim, Changwon

    2015-03-30

    Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.

  14. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    PubMed

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  15. Estimation of distributional parameters for censored trace level water quality data: 1. Estimation techniques

    USGS Publications Warehouse

    Gilliom, Robert J.; Helsel, Dennis R.

    1986-01-01

    A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations, for determining the best performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.

  16. Estimation of distributional parameters for censored trace level water quality data. 1. Estimation Techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilliom, R.J.; Helsel, D.R.

    1986-02-01

    A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensoredmore » observations, for determining the best performing parameter estimation method for any particular data det. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.« less

  17. Estimation of regression laws for ground motion parameters using as case of study the Amatrice earthquake

    NASA Astrophysics Data System (ADS)

    Tiberi, Lara; Costa, Giovanni

    2017-04-01

    The possibility to directly associate the damages to the ground motion parameters is always a great challenge, in particular for civil protections. Indeed a ground motion parameter, estimated in near real time that can express the damages occurred after an earthquake, is fundamental to arrange the first assistance after an event. The aim of this work is to contribute to the estimation of the ground motion parameter that better describes the observed intensity, immediately after an event. This can be done calculating for each ground motion parameter estimated in a near real time mode a regression law which correlates the above-mentioned parameter to the observed macro-seismic intensity. This estimation is done collecting high quality accelerometric data in near field, filtering them at different frequency steps. The regression laws are calculated using two different techniques: the non linear least-squares (NLLS) Marquardt-Levenberg algorithm and the orthogonal distance methodology (ODR). The limits of the first methodology are the needed of initial values for the parameters a and b (set 1.0 in this study), and the constraint that the independent variable must be known with greater accuracy than the dependent variable. While the second algorithm is based on the estimation of the errors perpendicular to the line, rather than just vertically. The vertical errors are just the errors in the 'y' direction, so only for the dependent variable whereas the perpendicular errors take into account errors for both the variables, the dependent and the independent. This makes possible also to directly invert the relation, so the a and b values can be used also to express the gmps as function of I. For each law the standard deviation and R2 value are estimated in order to test the quality and the reliability of the found relation. The Amatrice earthquake of 24th August of 2016 is used as case of study to test the goodness of the calculated regression laws.

  18. Logistic regression for dichotomized counts.

    PubMed

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  19. Estimation of distributional parameters for censored trace-level water-quality data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilliom, R.J.; Helsel, D.R.

    1984-01-01

    A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water-sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations,more » for determining the best-performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least-squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification. 6 figs., 6 tabs.« less

  20. An analysis of input errors in precipitation-runoff models using regression with errors in the independent variables

    USGS Publications Warehouse

    Troutman, Brent M.

    1982-01-01

    Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.

  1. Estimation of end point foot clearance points from inertial sensor data.

    PubMed

    Santhiranayagam, Braveena K; Lai, Daniel T H; Begg, Rezaul K; Palaniswami, Marimuthu

    2011-01-01

    Foot clearance parameters provide useful insight into tripping risks during walking. This paper proposes a technique for the estimate of key foot clearance parameters using inertial sensor (accelerometers and gyroscopes) data. Fifteen features were extracted from raw inertial sensor measurements, and a regression model was used to estimate two key foot clearance parameters: First maximum vertical clearance (m x 1) after toe-off and the Minimum Toe Clearance (MTC) of the swing foot. Comparisons are made against measurements obtained using an optoelectronic motion capture system (Optotrak), at 4 different walking speeds. General Regression Neural Networks (GRNN) were used to estimate the desired parameters from the sensor features. Eight subjects foot clearance data were examined and a Leave-one-subject-out (LOSO) method was used to select the best model. The best average Root Mean Square Errors (RMSE) across all subjects obtained using all sensor features at the maximum speed for m x 1 was 5.32 mm and for MTC was 4.04 mm. Further application of a hill-climbing feature selection technique resulted in 0.54-21.93% improvement in RMSE and required fewer input features. The results demonstrated that using raw inertial sensor data with regression models and feature selection could accurately estimate key foot clearance parameters.

  2. Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments

    PubMed Central

    Erkoc, Ali; Emiroglu, Esra

    2014-01-01

    In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738

  3. Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.

    PubMed

    Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas

    2014-01-01

    In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.

  4. Estimating a Logistic Discrimination Functions When One of the Training Samples Is Subject to Misclassification: A Maximum Likelihood Approach.

    PubMed

    Nagelkerke, Nico; Fidler, Vaclav

    2015-01-01

    The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.

  5. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  6. A different approach to estimate nonlinear regression model using numerical methods

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].

  7. Aerodynamic parameters of High-Angle-of attack Research Vehicle (HARV) estimated from flight data

    NASA Technical Reports Server (NTRS)

    Klein, Vladislav; Ratvasky, Thomas R.; Cobleigh, Brent R.

    1990-01-01

    Aerodynamic parameters of the High-Angle-of-Attack Research Aircraft (HARV) were estimated from flight data at different values of the angle of attack between 10 degrees and 50 degrees. The main part of the data was obtained from small amplitude longitudinal and lateral maneuvers. A small number of large amplitude maneuvers was also used in the estimation. The measured data were first checked for their compatibility. It was found that the accuracy of air data was degraded by unexplained bias errors. Then, the data were analyzed by a stepwise regression method for obtaining a structure of aerodynamic model equations and least squares parameter estimates. Because of high data collinearity in several maneuvers, some of the longitudinal and all lateral maneuvers were reanalyzed by using two biased estimation techniques, the principal components regression and mixed estimation. The estimated parameters in the form of stability and control derivatives, and aerodynamic coefficients were plotted against the angle of attack and compared with the wind tunnel measurements. The influential parameters are, in general, estimated with acceptable accuracy and most of them are in agreement with wind tunnel results. The simulated responses of the aircraft showed good prediction capabilities of the resulting model.

  8. Simple method for quick estimation of aquifer hydrogeological parameters

    NASA Astrophysics Data System (ADS)

    Ma, C.; Li, Y. Y.

    2017-08-01

    Development of simple and accurate methods to determine the aquifer hydrogeological parameters was of importance for groundwater resources assessment and management. Aiming at the present issue of estimating aquifer parameters based on some data of the unsteady pumping test, a fitting function of Theis well function was proposed using fitting optimization method and then a unitary linear regression equation was established. The aquifer parameters could be obtained by solving coefficients of the regression equation. The application of the proposed method was illustrated, using two published data sets. By the error statistics and analysis on the pumping drawdown, it showed that the method proposed in this paper yielded quick and accurate estimates of the aquifer parameters. The proposed method could reliably identify the aquifer parameters from long distance observed drawdowns and early drawdowns. It was hoped that the proposed method in this paper would be helpful for practicing hydrogeologists and hydrologists.

  9. Assessing Interval Estimation Methods for Hill Model Parameters in a High-Throughput Screening Context (SOT)

    EPA Science Inventory

    The Hill model of concentration-response is ubiquitous in toxicology, perhaps because its parameters directly relate to biologically significant metrics of toxicity such as efficacy and potency. Point estimates of these parameters obtained through least squares regression or maxi...

  10. Assessing Interval Estimation Methods for Hill Model Parameters in a High-Throughput Screening Context (IVIVE meeting)

    EPA Science Inventory

    The Hill model of concentration-response is ubiquitous in toxicology, perhaps because its parameters directly relate to biologically significant metrics of toxicity such as efficacy and potency. Point estimates of these parameters obtained through least squares regression or maxi...

  11. Reliability analysis of structural ceramic components using a three-parameter Weibull distribution

    NASA Technical Reports Server (NTRS)

    Duffy, Stephen F.; Powers, Lynn M.; Starlinger, Alois

    1992-01-01

    Described here are nonlinear regression estimators for the three-Weibull distribution. Issues relating to the bias and invariance associated with these estimators are examined numerically using Monte Carlo simulation methods. The estimators were used to extract parameters from sintered silicon nitride failure data. A reliability analysis was performed on a turbopump blade utilizing the three-parameter Weibull distribution and the estimates from the sintered silicon nitride data.

  12. Multi-Axis Identifiability Using Single-Surface Parameter Estimation Maneuvers on the X-48B Blended Wing Body

    NASA Technical Reports Server (NTRS)

    Ratnayake, Nalin A.; Koshimoto, Ed T.; Taylor, Brian R.

    2011-01-01

    The problem of parameter estimation on hybrid-wing-body type aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aero- dynamic control effectors that act in coplanar motion. This fact adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of system inputs must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, asymmetric, single-surface maneuvers are used to excite multiple axes of aircraft motion simultaneously. Time history reconstructions of the moment coefficients computed by the solved regression models are then compared to each other in order to assess relative model accuracy. The reduced flight-test time required for inner surface parameter estimation using multi-axis methods was found to come at the cost of slightly reduced accuracy and statistical confidence for linear regression methods. Since the multi-axis maneuvers captured parameter estimates similar to both longitudinal and lateral-directional maneuvers combined, the number of test points required for the inner, aileron-like surfaces could in theory have been reduced by 50%. While trends were similar, however, individual parameters as estimated by a multi-axis model were typically different by an average absolute difference of roughly 15-20%, with decreased statistical significance, than those estimated by a single-axis model. The multi-axis model exhibited an increase in overall fit error of roughly 1-5% for the linear regression estimates with respect to the single-axis model, when applied to flight data designed for each, respectively.

  13. A computer program (MODFLOWP) for estimating parameters of a transient, three-dimensional ground-water flow model using nonlinear regression

    USGS Publications Warehouse

    Hill, Mary Catherine

    1992-01-01

    This report documents a new version of the U.S. Geological Survey modular, three-dimensional, finite-difference, ground-water flow model (MODFLOW) which, with the new Parameter-Estimation Package that also is documented in this report, can be used to estimate parameters by nonlinear regression. The new version of MODFLOW is called MODFLOWP (pronounced MOD-FLOW*P), and functions nearly identically to MODFLOW when the ParameterEstimation Package is not used. Parameters are estimated by minimizing a weighted least-squares objective function by the modified Gauss-Newton method or by a conjugate-direction method. Parameters used to calculate the following MODFLOW model inputs can be estimated: Transmissivity and storage coefficient of confined layers; hydraulic conductivity and specific yield of unconfined layers; vertical leakance; vertical anisotropy (used to calculate vertical leakance); horizontal anisotropy; hydraulic conductance of the River, Streamflow-Routing, General-Head Boundary, and Drain Packages; areal recharge rates; maximum evapotranspiration; pumpage rates; and the hydraulic head at constant-head boundaries. Any spatial variation in parameters can be defined by the user. Data used to estimate parameters can include existing independent estimates of parameter values, observed hydraulic heads or temporal changes in hydraulic heads, and observed gains and losses along head-dependent boundaries (such as streams). Model output includes statistics for analyzing the parameter estimates and the model; these statistics can be used to quantify the reliability of the resulting model, to suggest changes in model construction, and to compare results of models constructed in different ways.

  14. Modeling absolute differences in life expectancy with a censored skew-normal regression approach

    PubMed Central

    Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544

  15. Stature estimation from the lengths of the growing foot-a study on North Indian adolescents.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam; DiMaggio, John A

    2012-12-01

    Stature estimation is considered as one of the basic parameters of the investigation process in unknown and commingled human remains in medico-legal case work. Race, age and sex are the other parameters which help in this process. Stature estimation is of the utmost importance as it completes the biological profile of a person along with the other three parameters of identification. The present research is intended to formulate standards for stature estimation from foot dimensions in adolescent males from North India and study the pattern of foot growth during the growing years. 154 male adolescents from the Northern part of India were included in the study. Besides stature, five anthropometric measurements that included the length of the foot from each toe (T1, T2, T3, T4, and T5 respectively) to pternion were measured on each foot. The data was analyzed statistically using Student's t-test, Pearson's correlation, linear and multiple regression analysis for estimation of stature and growth of foot during ages 13-18 years. Correlation coefficients between stature and all the foot measurements were found to be highly significant and positively correlated. Linear regression models and multiple regression models (with age as a co-variable) were derived for estimation of stature from the different measurements of the foot. Multiple regression models (with age as a co-variable) estimate stature with greater accuracy than the regression models for 13-18 years age group. The study shows the growth pattern of feet in North Indian adolescents and indicates that anthropometric measurements of the foot and its segments are valuable in estimation of stature in growing individuals of that population. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  17. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  18. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  19. Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

    PubMed

    Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard

    2016-10-01

    In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.

  20. Regression equations for estimating flood flows for the 2-, 10-, 25-, 50-, 100-, and 500-Year recurrence intervals in Connecticut

    USGS Publications Warehouse

    Ahearn, Elizabeth A.

    2004-01-01

    Multiple linear-regression equations were developed to estimate the magnitudes of floods in Connecticut for recurrence intervals ranging from 2 to 500 years. The equations can be used for nonurban, unregulated stream sites in Connecticut with drainage areas ranging from about 2 to 715 square miles. Flood-frequency data and hydrologic characteristics from 70 streamflow-gaging stations and the upstream drainage basins were used to develop the equations. The hydrologic characteristics?drainage area, mean basin elevation, and 24-hour rainfall?are used in the equations to estimate the magnitude of floods. Average standard errors of prediction for the equations are 31.8, 32.7, 34.4, 35.9, 37.6 and 45.0 percent for the 2-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals, respectively. Simplified equations using only one hydrologic characteristic?drainage area?also were developed. The regression analysis is based on generalized least-squares regression techniques. Observed flows (log-Pearson Type III analysis of the annual maximum flows) from five streamflow-gaging stations in urban basins in Connecticut were compared to flows estimated from national three-parameter and seven-parameter urban regression equations. The comparison shows that the three- and seven- parameter equations used in conjunction with the new statewide equations generally provide reasonable estimates of flood flows for urban sites in Connecticut, although a national urban flood-frequency study indicated that the three-parameter equations significantly underestimated flood flows in many regions of the country. Verification of the accuracy of the three-parameter or seven-parameter national regression equations using new data from Connecticut stations was beyond the scope of this study. A technique for calculating flood flows at streamflow-gaging stations using a weighted average also is described. Two estimates of flood flows?one estimate based on the log-Pearson Type III analyses of the annual maximum flows at the gaging station, and the other estimate from the regression equation?are weighted together based on the years of record at the gaging station and the equivalent years of record value determined from the regression. Weighted averages of flood flows for the 2-, 10-, 25-, 50-, 100-, and 500-year recurrence intervals are tabulated for the 70 streamflow-gaging stations used in the regression analysis. Generally, weighted averages give the most accurate estimate of flood flows at gaging stations. An evaluation of the Connecticut's streamflow-gaging network was performed to determine whether the spatial coverage and range of geographic and hydrologic conditions are adequately represented for transferring flood characteristics from gaged to ungaged sites. Fifty-one of 54 stations in the current (2004) network support one or more flood needs of federal, state, and local agencies. Twenty-five of 54 stations in the current network are considered high-priority stations by the U.S. Geological Survey because of their contribution to the longterm understanding of floods, and their application for regionalflood analysis. Enhancements to the network to improve overall effectiveness for regionalization can be made by increasing the spatial coverage of gaging stations, establishing stations in regions of the state that are not well-represented, and adding stations in basins with drainage area sizes not represented. Additionally, the usefulness of the network for characterizing floods can be maintained and improved by continuing operation at the current stations because flood flows can be more accurately estimated at stations with continuous, long-term record.

  1. Quantile regression models of animal habitat relationships

    USGS Publications Warehouse

    Cade, Brian S.

    2003-01-01

    Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.

  2. Linear regression analysis of survival data with missing censoring indicators.

    PubMed

    Wang, Qihua; Dinse, Gregg E

    2011-04-01

    Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

  3. Fisher Scoring Method for Parameter Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia

    2017-06-01

    GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.

  4. Comparison of stability and control parameters for a light, single-engine, high-winged aircraft using different flight test and parameter estimation techniques

    NASA Technical Reports Server (NTRS)

    Suit, W. T.; Cannaday, R. L.

    1979-01-01

    The longitudinal and lateral stability and control parameters for a high wing, general aviation, airplane are examined. Estimations using flight data obtained at various flight conditions within the normal range of the aircraft are presented. The estimations techniques, an output error technique (maximum likelihood) and an equation error technique (linear regression), are presented. The longitudinal static parameters are estimated from climbing, descending, and quasi steady state flight data. The lateral excitations involve a combination of rudder and ailerons. The sensitivity of the aircraft modes of motion to variations in the parameter estimates are discussed.

  5. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  6. Geographically weighted regression and multicollinearity: dispelling the myth

    NASA Astrophysics Data System (ADS)

    Fotheringham, A. Stewart; Oshan, Taylor M.

    2016-10-01

    Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.

  7. Elbow joint angle and elbow movement velocity estimation using NARX-multiple layer perceptron neural network model with surface EMG time domain parameters.

    PubMed

    Raj, Retheep; Sivanandan, K S

    2017-01-01

    Estimation of elbow dynamics has been the object of numerous investigations. In this work a solution is proposed for estimating elbow movement velocity and elbow joint angle from Surface Electromyography (SEMG) signals. Here the Surface Electromyography signals are acquired from the biceps brachii muscle of human hand. Two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), are extracted from the Surface Electromyography signal. The relationship between the time domain parameters, IEMG and ZC with elbow angular displacement and elbow angular velocity during extension and flexion of the elbow are studied. A multiple input-multiple output model is derived for identifying the kinematics of elbow. A Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural network (MLPNN) model is proposed for the estimation of elbow joint angle and elbow angular velocity. The proposed NARX MLPNN model is trained using Levenberg-marquardt based algorithm. The proposed model is estimating the elbow joint angle and elbow movement angular velocity with appreciable accuracy. The model is validated using regression coefficient value (R). The average regression coefficient value (R) obtained for elbow angular displacement prediction is 0.9641 and for the elbow anglular velocity prediction is 0.9347. The Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural networks (MLPNN) model can be used for the estimation of angular displacement and movement angular velocity of the elbow with good accuracy.

  8. Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

    NASA Astrophysics Data System (ADS)

    Reis, D. S.; Stedinger, J. R.; Martins, E. S.

    2005-10-01

    This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.

  9. UCODE, a computer code for universal inverse modeling

    USGS Publications Warehouse

    Poeter, E.P.; Hill, M.C.

    1999-01-01

    This article presents the US Geological Survey computer program UCODE, which was developed in collaboration with the US Army Corps of Engineers Waterways Experiment Station and the International Ground Water Modeling Center of the Colorado School of Mines. UCODE performs inverse modeling, posed as a parameter-estimation problem, using nonlinear regression. Any application model or set of models can be used; the only requirement is that they have numerical (ASCII or text only) input and output files and that the numbers in these files have sufficient significant digits. Application models can include preprocessors and postprocessors as well as models related to the processes of interest (physical, chemical and so on), making UCODE extremely powerful for model calibration. Estimated parameters can be defined flexibly with user-specified functions. Observations to be matched in the regression can be any quantity for which a simulated equivalent value can be produced, thus simulated equivalent values are calculated using values that appear in the application model output files and can be manipulated with additive and multiplicative functions, if necessary. Prior, or direct, information on estimated parameters also can be included in the regression. The nonlinear regression problem is solved by minimizing a weighted least-squares objective function with respect to the parameter values using a modified Gauss-Newton method. Sensitivities needed for the method are calculated approximately by forward or central differences and problems and solutions related to this approximation are discussed. Statistics are calculated and printed for use in (1) diagnosing inadequate data or identifying parameters that probably cannot be estimated with the available data, (2) evaluating estimated parameter values, (3) evaluating the model representation of the actual processes and (4) quantifying the uncertainty of model simulated values. UCODE is intended for use on any computer operating system: it consists of algorithms programmed in perl, a freeware language designed for text manipulation and Fortran90, which efficiently performs numerical calculations.

  10. Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation

    NASA Astrophysics Data System (ADS)

    Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.

    A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.

  11. Does Bootstrap Procedure Provide Biased Estimates? An Empirical Examination for a Case of Multiple Regression.

    ERIC Educational Resources Information Center

    Fan, Xitao

    This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample…

  12. Computing mammographic density from a multiple regression model constructed with image-acquisition parameters from a full-field digital mammographic unit

    PubMed Central

    Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.

    2009-01-01

    Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343

  13. A simulation study on Bayesian Ridge regression models for several collinearity levels

    NASA Astrophysics Data System (ADS)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  14. Deriving percentage study weights in multi-parameter meta-analysis models: with application to meta-regression, network meta-analysis and one-stage individual participant data models.

    PubMed

    Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L

    2017-01-01

    Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).

  15. A method for nonlinear exponential regression analysis

    NASA Technical Reports Server (NTRS)

    Junkin, B. G.

    1971-01-01

    A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.

  16. Flood characteristics of urban watersheds in the United States

    USGS Publications Warehouse

    Sauer, Vernon B.; Thomas, W.O.; Stricker, V.A.; Wilson, K.V.

    1983-01-01

    A nationwide study of flood magnitude and frequency in urban areas was made for the purpose of reviewing available literature, compiling an urban flood data base, and developing methods of estimating urban floodflow characteristics in ungaged areas. The literature review contains synopses of 128 recent publications related to urban floodflow. A data base of 269 gaged basins in 56 cities and 31 States, including Hawaii, contains a wide variety of topographic and climatic characteristics, land-use variables, indices of urbanization, and flood-frequency estimates. Three sets of regression equations were developed to estimate flood discharges for ungaged sites for recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years. Two sets of regression equations are based on seven independent parameters and the third is based on three independent parameters. The only difference in the two sets of seven-parameter equations is the use of basin lag time in one and lake and reservoir storage in the other. Of primary importance in these equations is an independent estimate of the equivalent rural discharge for the ungaged basin. The equations adjust the equivalent rural discharge to an urban condition. The primary adjustment factor, or index of urbanization, is the basin development factor, a measure of the extent of development of the drainage system in the basin. This measure includes evaluations of storm drains (sewers), channel improvements, and curb-and-gutter streets. The basin development factor is statistically very significant and offers a simple and effective way of accounting for drainage development and runoff response in urban areas. Percentage of impervious area is also included in the seven-parameter equations as an additional measure of urbanization and apparently accounts for increased runoff volumes. This factor is not highly significant for large floods, which supports the generally held concept that imperviousness is not a dominant factor when soils become more saturated during large storms. Other parameters in the seven-parameter equations include drainage area size, channel slope, rainfall intensity, lake and reservoir storage, and basin lag time. These factors are all statistically significant and provide logical indices of basin conditions. The three-parameter equations include only the three most significant parameters: rural discharge, basin-development factor, and drainage area size. All three sets of regression equations provide unbiased estimates of urban flood frequency. The seven-parameter regression equations without basin lag time have average standard errors of regression varying from ? 37 percent for the 5-year flood to ? 44 percent for the 100-year flood and ? 49 percent for the 500-year flood. The other two sets of regression equations have similar accuracy. Several tests for bias, sensitivity, and hydrologic consistency are included which support the conclusion that the equations are useful throughout the United States. All estimating equations were developed from data collected on drainage basins where temporary in-channel storage, due to highway embankments, was not significant. Consequently, estimates made with these equations do not account for the reducing effect of this temporary detention storage.

  17. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    NASA Astrophysics Data System (ADS)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  18. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  19. Estimating standard errors in feature network models.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2007-05-01

    Feature network models are graphical structures that represent proximity data in a discrete space while using the same formalism that is the basis of least squares methods employed in multidimensional scaling. Existing methods to derive a network model from empirical data only give the best-fitting network and yield no standard errors for the parameter estimates. The additivity properties of networks make it possible to consider the model as a univariate (multiple) linear regression problem with positivity restrictions on the parameters. In the present study, both theoretical and empirical standard errors are obtained for the constrained regression parameters of a network model with known features. The performance of both types of standard error is evaluated using Monte Carlo techniques.

  20. The use of generalized estimating equations in the analysis of motor vehicle crash data.

    PubMed

    Hutchings, Caroline B; Knight, Stacey; Reading, James C

    2003-01-01

    The purpose of this study was to determine if it is necessary to use generalized estimating equations (GEEs) in the analysis of seat belt effectiveness in preventing injuries in motor vehicle crashes. The 1992 Utah crash dataset was used, excluding crash participants where seat belt use was not appropriate (n=93,633). The model used in the 1996 Report to Congress [Report to congress on benefits of safety belts and motorcycle helmets, based on data from the Crash Outcome Data Evaluation System (CODES). National Center for Statistics and Analysis, NHTSA, Washington, DC, February 1996] was analyzed for all occupants with logistic regression, one level of nesting (occupants within crashes), and two levels of nesting (occupants within vehicles within crashes) to compare the use of GEEs with logistic regression. When using one level of nesting compared to logistic regression, 13 of 16 variance estimates changed more than 10%, and eight of 16 parameter estimates changed more than 10%. In addition, three of the independent variables changed from significant to insignificant (alpha=0.05). With the use of two levels of nesting, two of 16 variance estimates and three of 16 parameter estimates changed more than 10% from the variance and parameter estimates in one level of nesting. One of the independent variables changed from insignificant to significant (alpha=0.05) in the two levels of nesting model; therefore, only two of the independent variables changed from significant to insignificant when the logistic regression model was compared to the two levels of nesting model. The odds ratio of seat belt effectiveness in preventing injuries was 12% lower when a one-level nested model was used. Based on these results, we stress the need to use a nested model and GEEs when analyzing motor vehicle crash data.

  1. Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

    PubMed

    Xie, Yanmei; Zhang, Biao

    2017-04-20

    Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

  2. Exposure reconstruction for the TCDD-exposed NIOSH cohort using a concentration- and age-dependent model of elimination.

    PubMed

    Aylward, Lesa L; Brunet, Robert C; Starr, Thomas B; Carrier, Gaétan; Delzell, Elizabeth; Cheng, Hong; Beall, Colleen

    2005-08-01

    Recent studies demonstrating a concentration dependence of elimination of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) suggest that previous estimates of exposure for occupationally exposed cohorts may have underestimated actual exposure, resulting in a potential overestimate of the carcinogenic potency of TCDD in humans based on the mortality data for these cohorts. Using a database on U.S. chemical manufacturing workers potentially exposed to TCDD compiled by the National Institute for Occupational Safety and Health (NIOSH), we evaluated the impact of using a concentration- and age-dependent elimination model (CADM) (Aylward et al., 2005) on estimates of serum lipid area under the curve (AUC) for the NIOSH cohort. These data were used previously by Steenland et al. (2001) in combination with a first-order elimination model with an 8.7-year half-life to estimate cumulative serum lipid concentration (equivalent to AUC) for these workers for use in cancer dose-response assessment. Serum lipid TCDD measurements taken in 1988 for a subset of the cohort were combined with the NIOSH job exposure matrix and work histories to estimate dose rates per unit of exposure score. We evaluated the effect of choices in regression model (regression on untransformed vs. ln-transformed data and inclusion of a nonzero regression intercept) as well as the impact of choices of elimination models and parameters on estimated AUCs for the cohort. Central estimates for dose rate parameters derived from the serum-sampled subcohort were applied with the elimination models to time-specific exposure scores for the entire cohort to generate AUC estimates for all cohort members. Use of the CADM resulted in improved model fits to the serum sampling data compared to the first-order models. Dose rates varied by a factor of 50 among different combinations of elimination model, parameter sets, and regression models. Use of a CADM results in increases of up to five-fold in AUC estimates for the more highly exposed members of the cohort compared to estimates obtained using the first-order model with 8.7-year half-life. This degree of variation in the AUC estimates for this cohort would affect substantially the cancer potency estimates derived from the mortality data from this cohort. Such variability and uncertainty in the reconstructed serum lipid AUC estimates for this cohort, depending on elimination model, parameter set, and regression model, have not been described previously and are critical components in evaluating the dose-response data from the occupationally exposed populations.

  3. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  4. The Breslow estimator of the nonparametric baseline survivor function in Cox's regression model: some heuristics.

    PubMed

    Hanley, James A

    2008-01-01

    Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.

  5. Development of an Agent-Based Model (ABM) to Simulate the Immune System and Integration of a Regression Method to Estimate the Key ABM Parameters by Fitting the Experimental Data

    PubMed Central

    Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le

    2015-01-01

    Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589

  6. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.

  7. Estimating effects of limiting factors with regression quantiles

    USGS Publications Warehouse

    Cade, B.S.; Terrell, J.W.; Schroeder, R.L.

    1999-01-01

    In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.

  8. Quantitative assessment of cervical vertebral maturation using cone beam computed tomography in Korean girls.

    PubMed

    Byun, Bo-Ram; Kim, Yong-Il; Yamaguchi, Tetsutaro; Maki, Koutaro; Son, Woo-Sung

    2015-01-01

    This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6-18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R (2) had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status.

  9. Space shuttle propulsion parameter estimation using optional estimation techniques

    NASA Technical Reports Server (NTRS)

    1983-01-01

    A regression analyses on tabular aerodynamic data provided. A representative aerodynamic model for coefficient estimation. It also reduced the storage requirements for the "normal' model used to check out the estimation algorithms. The results of the regression analyses are presented. The computer routines for the filter portion of the estimation algorithm and the :"bringing-up' of the SRB predictive program on the computer was developed. For the filter program, approximately 54 routines were developed. The routines were highly subsegmented to facilitate overlaying program segments within the partitioned storage space on the computer.

  10. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    ERIC Educational Resources Information Center

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  11. Regional regression of flood characteristics employing historical information

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1987-01-01

    Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.

  12. Stochastic Approximation Methods for Latent Regression Item Response Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  13. Adding a Parameter Increases the Variance of an Estimated Regression Function

    ERIC Educational Resources Information Center

    Withers, Christopher S.; Nadarajah, Saralees

    2011-01-01

    The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…

  14. Genetic Parameter Estimates for Metabolizing Two Common Pharmaceuticals in Swine.

    PubMed

    Howard, Jeremy T; Ashwell, Melissa S; Baynes, Ronald E; Brooks, James D; Yeatts, James L; Maltecca, Christian

    2018-01-01

    In livestock, the regulation of drugs used to treat livestock has received increased attention and it is currently unknown how much of the phenotypic variation in drug metabolism is due to the genetics of an animal. Therefore, the objective of the study was to determine the amount of phenotypic variation in fenbendazole and flunixin meglumine drug metabolism due to genetics. The population consisted of crossbred female and castrated male nursery pigs ( n = 198) that were sired by boars represented by four breeds. The animals were spread across nine batches. Drugs were administered intravenously and blood collected a minimum of 10 times over a 48 h period. Genetic parameters for the parent drug and metabolite concentration within each drug were estimated based on pharmacokinetics (PK) parameters or concentrations across time utilizing a random regression model. The PK parameters were estimated using a non-compartmental analysis. The PK model included fixed effects of sex and breed of sire along with random sire and batch effects. The random regression model utilized Legendre polynomials and included a fixed population concentration curve, sex, and breed of sire effects along with a random sire deviation from the population curve and batch effect. The sire effect included the intercept for all models except for the fenbendazole metabolite (i.e., intercept and slope). The mean heritability across PK parameters for the fenbendazole and flunixin meglumine parent drug (metabolite) was 0.15 (0.18) and 0.31 (0.40), respectively. For the parent drug (metabolite), the mean heritability across time was 0.27 (0.60) and 0.14 (0.44) for fenbendazole and flunixin meglumine, respectively. The errors surrounding the heritability estimates for the random regression model were smaller compared to estimates obtained from PK parameters. Across both the PK and plasma drug concentration across model, a moderate heritability was estimated. The model that utilized the plasma drug concentration across time resulted in estimates with a smaller standard error compared to models that utilized PK parameters. The current study found a low to moderate proportion of the phenotypic variation in metabolizing fenbendazole and flunixin meglumine that was explained by genetics in the current study.

  15. Genetic Parameter Estimates for Metabolizing Two Common Pharmaceuticals in Swine

    PubMed Central

    Howard, Jeremy T.; Ashwell, Melissa S.; Baynes, Ronald E.; Brooks, James D.; Yeatts, James L.; Maltecca, Christian

    2018-01-01

    In livestock, the regulation of drugs used to treat livestock has received increased attention and it is currently unknown how much of the phenotypic variation in drug metabolism is due to the genetics of an animal. Therefore, the objective of the study was to determine the amount of phenotypic variation in fenbendazole and flunixin meglumine drug metabolism due to genetics. The population consisted of crossbred female and castrated male nursery pigs (n = 198) that were sired by boars represented by four breeds. The animals were spread across nine batches. Drugs were administered intravenously and blood collected a minimum of 10 times over a 48 h period. Genetic parameters for the parent drug and metabolite concentration within each drug were estimated based on pharmacokinetics (PK) parameters or concentrations across time utilizing a random regression model. The PK parameters were estimated using a non-compartmental analysis. The PK model included fixed effects of sex and breed of sire along with random sire and batch effects. The random regression model utilized Legendre polynomials and included a fixed population concentration curve, sex, and breed of sire effects along with a random sire deviation from the population curve and batch effect. The sire effect included the intercept for all models except for the fenbendazole metabolite (i.e., intercept and slope). The mean heritability across PK parameters for the fenbendazole and flunixin meglumine parent drug (metabolite) was 0.15 (0.18) and 0.31 (0.40), respectively. For the parent drug (metabolite), the mean heritability across time was 0.27 (0.60) and 0.14 (0.44) for fenbendazole and flunixin meglumine, respectively. The errors surrounding the heritability estimates for the random regression model were smaller compared to estimates obtained from PK parameters. Across both the PK and plasma drug concentration across model, a moderate heritability was estimated. The model that utilized the plasma drug concentration across time resulted in estimates with a smaller standard error compared to models that utilized PK parameters. The current study found a low to moderate proportion of the phenotypic variation in metabolizing fenbendazole and flunixin meglumine that was explained by genetics in the current study. PMID:29487615

  16. Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

    ERIC Educational Resources Information Center

    Algina, James; Olejnik, Stephen

    2000-01-01

    Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)

  17. Properties of added variable plots in Cox's regression model.

    PubMed

    Lindkvist, M

    2000-03-01

    The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics.

  18. Application of a parameter-estimation technique to modeling the regional aquifer underlying the eastern Snake River plain, Idaho

    USGS Publications Warehouse

    Garabedian, Stephen P.

    1986-01-01

    A nonlinear, least-squares regression technique for the estimation of ground-water flow model parameters was applied to the regional aquifer underlying the eastern Snake River Plain, Idaho. The technique uses a computer program to simulate two-dimensional, steady-state ground-water flow. Hydrologic data for the 1980 water year were used to calculate recharge rates, boundary fluxes, and spring discharges. Ground-water use was estimated from irrigated land maps and crop consumptive-use figures. These estimates of ground-water withdrawal, recharge rates, and boundary flux, along with leakance, were used as known values in the model calibration of transmissivity. Leakance values were adjusted between regression solutions by comparing model-calculated to measured spring discharges. In other simulations, recharge and leakance also were calibrated as prior-information regression parameters, which limits the variation of these parameters using a normalized standard error of estimate. Results from a best-fit model indicate a wide areal range in transmissivity from about 0.05 to 44 feet squared per second and in leakance from about 2.2x10 -9 to 6.0 x 10 -8 feet per second per foot. Along with parameter values, model statistics also were calculated, including the coefficient of correlation between calculated and observed head (0.996), the standard error of the estimates for head (40 feet), and the parameter coefficients of variation (about 10-40 percent). Additional boundary flux was added in some areas during calibration to achieve proper fit to ground-water flow directions. Model fit improved significantly when areas that violated model assumptions were removed. It also improved slightly when y-direction (northwest-southeast) transmissivity values were larger than x-direction (northeast-southwest) transmissivity values. The model was most sensitive to changes in recharge, and in some areas, to changes in transmissivity, particularly near the spring discharge area from Milner Dam to King Hill.

  19. Regression to fuzziness method for estimation of remaining useful life in power plant components

    NASA Astrophysics Data System (ADS)

    Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.

    2014-10-01

    Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.

  20. A parameter estimation subroutine package

    NASA Technical Reports Server (NTRS)

    Bierman, G. J.; Nead, W. M.

    1977-01-01

    Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. FORTRAN subroutines have been developed to facilitate analyses of a variety of parameter estimation problems. Easy to use multipurpose sets of algorithms are reported that are reasonably efficient and which use a minimal amount of computer storage. Subroutine inputs, outputs, usage and listings are given, along with examples of how these routines can be used.

  1. Regression dilution in the proportional hazards model.

    PubMed

    Hughes, M D

    1993-12-01

    The problem of regression dilution arising from covariate measurement error is investigated for survival data using the proportional hazards model. The naive approach to parameter estimation is considered whereby observed covariate values are used, inappropriately, in the usual analysis instead of the underlying covariate values. A relationship between the estimated parameter in large samples and the true parameter is obtained showing that the bias does not depend on the form of the baseline hazard function when the errors are normally distributed. With high censorship, adjustment of the naive estimate by the factor 1 + lambda, where lambda is the ratio of within-person variability about an underlying mean level to the variability of these levels in the population sampled, removes the bias. As censorship increases, the adjustment required increases and when there is no censorship is markedly higher than 1 + lambda and depends also on the true risk relationship.

  2. Estimates of genetic parameters in turkeys. 3. Sexual dimorphism and its implications in selection procedures.

    PubMed

    Toelle, V D; Havenstein, G B; Nestor, K E; Bacon, W L

    1990-10-01

    Live, carcass, and skeletal data taken at 16 wk of age on 504 female and 584 male turkeys from 34 sires and 168 dams were utilized to evaluate sex differences in genetic parameter estimates. Data were transformed to common mean and variance to evaluate possible scaling effects. Genetic parameters were estimated from transformed and untransformed data. Further analyses were conducted with a model that included sire by sex and dams within sire by sex interactions, and the variance estimates were used to calculate genetic correlations between the sexes and genetic regression parameters. Heritability estimates from transformed and untransformed data were similar, indicating that sex differences were present in the genetic parameters, but scaling effects were not an important factor. Genetic correlation estimates from paternal (PHS) and maternal (MHS) half-sib estimates were close to unity for BW (1.14, PHS; 1.09, MHS), shank width (.99, PHS; .93, MHS), breast muscle weight (1.23, PHS; 1.04, MHS), and shank length (1.09, PHS; .97, MHS). However, abdominal fat (.79, PHS; .59 MHS), total drumstick muscle weight (.75, PHS; 1.14, MHS), rough cleaned shank weight (.78, PHS; not estimatable, MHS), and shank bone density (1.00, PHS; .53, MHS) estimates were somewhat lower. The estimates suggest that the measurement of these latter "traits" at the same age in the two sexes may, in fact, be measuring different genetic effects and that selection procedures in turkeys need to take these correlations into account in order to make optimum progress. The genetic regression parameters indicated that more intense selection in the sex that has the smaller genetic variation could be practiced to make greater gains in the opposite sex.

  3. Least-squares sequential parameter and state estimation for large space structures

    NASA Technical Reports Server (NTRS)

    Thau, F. E.; Eliazov, T.; Montgomery, R. C.

    1982-01-01

    This paper presents the formulation of simultaneous state and parameter estimation problems for flexible structures in terms of least-squares minimization problems. The approach combines an on-line order determination algorithm, with least-squares algorithms for finding estimates of modal approximation functions, modal amplitudes, and modal parameters. The approach combines previous results on separable nonlinear least squares estimation with a regression analysis formulation of the state estimation problem. The technique makes use of sequential Householder transformations. This allows for sequential accumulation of matrices required during the identification process. The technique is used to identify the modal prameters of a flexible beam.

  4. A weighted least squares estimation of the polynomial regression model on paddy production in the area of Kedah and Perlis

    NASA Astrophysics Data System (ADS)

    Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd

    2017-08-01

    The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.

  5. Detecting Anomalies in Process Control Networks

    NASA Astrophysics Data System (ADS)

    Rrushi, Julian; Kang, Kyoung-Don

    This paper presents the estimation-inspection algorithm, a statistical algorithm for anomaly detection in process control networks. The algorithm determines if the payload of a network packet that is about to be processed by a control system is normal or abnormal based on the effect that the packet will have on a variable stored in control system memory. The estimation part of the algorithm uses logistic regression integrated with maximum likelihood estimation in an inductive machine learning process to estimate a series of statistical parameters; these parameters are used in conjunction with logistic regression formulas to form a probability mass function for each variable stored in control system memory. The inspection part of the algorithm uses the probability mass functions to estimate the normalcy probability of a specific value that a network packet writes to a variable. Experimental results demonstrate that the algorithm is very effective at detecting anomalies in process control networks.

  6. A Solution to Separation and Multicollinearity in Multiple Logistic Regression

    PubMed Central

    Shen, Jianzhao; Gao, Sujuan

    2010-01-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286

  7. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    PubMed

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  8. Accurate motion parameter estimation for colonoscopy tracking using a regression method

    NASA Astrophysics Data System (ADS)

    Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.

    2010-03-01

    Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.

  9. Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method.

    PubMed

    Leung, Denis H Y; Wang, You-Gan; Zhu, Min

    2009-07-01

    The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.

  10. Estimating leaf area and leaf biomass of open-grown deciduous urban trees

    Treesearch

    David J. Nowak

    1996-01-01

    Logarithmic regression equations were developed to predict leaf area and leaf biomass for open-grown deciduous urban trees based on stem diameter and crown parameters. Equations based on crown parameters produced more reliable estimates. The equations can be used to help quantify forest structure and functions, particularly in urbanizing and urban/suburban areas.

  11. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

    ERIC Educational Resources Information Center

    Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

    2013-01-01

    In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

  12. Estimating the Prevalence of Atrial Fibrillation from A Three-Class Mixture Model for Repeated Diagnoses

    PubMed Central

    Li, Liang; Mao, Huzhang; Ishwaran, Hemant; Rajeswaran, Jeevanantham; Ehrlinger, John; Blackstone, Eugene H.

    2016-01-01

    Atrial fibrillation (AF) is an abnormal heart rhythm characterized by rapid and irregular heart beat, with or without perceivable symptoms. In clinical practice, the electrocardiogram (ECG) is often used for diagnosis of AF. Since the AF often arrives as recurrent episodes of varying frequency and duration and only the episodes that occur at the time of ECG can be detected, the AF is often underdiagnosed when a limited number of repeated ECGs are used. In studies evaluating the efficacy of AF ablation surgery, each patient undergo multiple ECGs and the AF status at the time of ECG is recorded. The objective of this paper is to estimate the marginal proportions of patients with or without AF in a population, which are important measures of the efficacy of the treatment. The underdiagnosis problem is addressed by a three-class mixture regression model in which a patient’s probability of having no AF, paroxysmal AF, and permanent AF is modeled by auxiliary baseline covariates in a nested logistic regression. A binomial regression model is specified conditional on a subject being in the paroxysmal AF group. The model parameters are estimated by the EM algorithm. These parameters are themselves nuisance parameters for the purpose of this research, but the estimators of the marginal proportions of interest can be expressed as functions of the data and these nuisance parameters and their variances can be estimated by the sandwich method. We examine the performance of the proposed methodology in simulations and two real data applications. PMID:27983754

  13. Estimating the prevalence of atrial fibrillation from a three-class mixture model for repeated diagnoses.

    PubMed

    Li, Liang; Mao, Huzhang; Ishwaran, Hemant; Rajeswaran, Jeevanantham; Ehrlinger, John; Blackstone, Eugene H

    2017-03-01

    Atrial fibrillation (AF) is an abnormal heart rhythm characterized by rapid and irregular heartbeat, with or without perceivable symptoms. In clinical practice, the electrocardiogram (ECG) is often used for diagnosis of AF. Since the AF often arrives as recurrent episodes of varying frequency and duration and only the episodes that occur at the time of ECG can be detected, the AF is often underdiagnosed when a limited number of repeated ECGs are used. In studies evaluating the efficacy of AF ablation surgery, each patient undergoes multiple ECGs and the AF status at the time of ECG is recorded. The objective of this paper is to estimate the marginal proportions of patients with or without AF in a population, which are important measures of the efficacy of the treatment. The underdiagnosis problem is addressed by a three-class mixture regression model in which a patient's probability of having no AF, paroxysmal AF, and permanent AF is modeled by auxiliary baseline covariates in a nested logistic regression. A binomial regression model is specified conditional on a subject being in the paroxysmal AF group. The model parameters are estimated by the Expectation-Maximization (EM) algorithm. These parameters are themselves nuisance parameters for the purpose of this research, but the estimators of the marginal proportions of interest can be expressed as functions of the data and these nuisance parameters and their variances can be estimated by the sandwich method. We examine the performance of the proposed methodology in simulations and two real data applications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Logistic regression of family data from retrospective study designs.

    PubMed

    Whittemore, Alice S; Halpern, Jerry

    2003-11-01

    We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.

  15. Estimation of Covariance Matrix on Bi-Response Longitudinal Data Analysis with Penalized Spline Regression

    NASA Astrophysics Data System (ADS)

    Islamiyati, A.; Fatmawati; Chamidah, N.

    2018-03-01

    The correlation assumption of the longitudinal data with bi-response occurs on the measurement between the subjects of observation and the response. It causes the auto-correlation of error, and this can be overcome by using a covariance matrix. In this article, we estimate the covariance matrix based on the penalized spline regression model. Penalized spline involves knot points and smoothing parameters simultaneously in controlling the smoothness of the curve. Based on our simulation study, the estimated regression model of the weighted penalized spline with covariance matrix gives a smaller error value compared to the error of the model without covariance matrix.

  16. Quantitative Assessment of Cervical Vertebral Maturation Using Cone Beam Computed Tomography in Korean Girls

    PubMed Central

    Byun, Bo-Ram; Kim, Yong-Il; Maki, Koutaro; Son, Woo-Sung

    2015-01-01

    This study was aimed to examine the correlation between skeletal maturation status and parameters from the odontoid process/body of the second vertebra and the bodies of third and fourth cervical vertebrae and simultaneously build multiple regression models to be able to estimate skeletal maturation status in Korean girls. Hand-wrist radiographs and cone beam computed tomography (CBCT) images were obtained from 74 Korean girls (6–18 years of age). CBCT-generated cervical vertebral maturation (CVM) was used to demarcate the odontoid process and the body of the second cervical vertebra, based on the dentocentral synchondrosis. Correlation coefficient analysis and multiple linear regression analysis were used for each parameter of the cervical vertebrae (P < 0.05). Forty-seven of 64 parameters from CBCT-generated CVM (independent variables) exhibited statistically significant correlations (P < 0.05). The multiple regression model with the greatest R 2 had six parameters (PH2/W2, UW2/W2, (OH+AH2)/LW2, UW3/LW3, D3, and H4/W4) as independent variables with a variance inflation factor (VIF) of <2. CBCT-generated CVM was able to include parameters from the second cervical vertebral body and odontoid process, respectively, for the multiple regression models. This suggests that quantitative analysis might be used to estimate skeletal maturation status. PMID:25878721

  17. Estimating linear temporal trends from aggregated environmental monitoring data

    USGS Publications Warehouse

    Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.

    2017-01-01

    Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.

  18. Iterative integral parameter identification of a respiratory mechanics model.

    PubMed

    Schranz, Christoph; Docherty, Paul D; Chiew, Yeong Shiong; Möller, Knut; Chase, J Geoffrey

    2012-07-18

    Patient-specific respiratory mechanics models can support the evaluation of optimal lung protective ventilator settings during ventilation therapy. Clinical application requires that the individual's model parameter values must be identified with information available at the bedside. Multiple linear regression or gradient-based parameter identification methods are highly sensitive to noise and initial parameter estimates. Thus, they are difficult to apply at the bedside to support therapeutic decisions. An iterative integral parameter identification method is applied to a second order respiratory mechanics model. The method is compared to the commonly used regression methods and error-mapping approaches using simulated and clinical data. The clinical potential of the method was evaluated on data from 13 Acute Respiratory Distress Syndrome (ARDS) patients. The iterative integral method converged to error minima 350 times faster than the Simplex Search Method using simulation data sets and 50 times faster using clinical data sets. Established regression methods reported erroneous results due to sensitivity to noise. In contrast, the iterative integral method was effective independent of initial parameter estimations, and converged successfully in each case tested. These investigations reveal that the iterative integral method is beneficial with respect to computing time, operator independence and robustness, and thus applicable at the bedside for this clinical application.

  19. Estimation of peak-discharge frequency of urban streams in Jefferson County, Kentucky

    USGS Publications Warehouse

    Martin, Gary R.; Ruhl, Kevin J.; Moore, Brian L.; Rose, Martin F.

    1997-01-01

    An investigation of flood-hydrograph characteristics for streams in urban Jefferson County, Kentucky, was made to obtain hydrologic information needed for waterresources management. Equations for estimating peak-discharge frequencies for ungaged streams in the county were developed by combining (1) long-term annual peakdischarge data and rainfall-runoff data collected from 1991 to 1995 in 13 urban basins and (2) long-term annual peak-discharge data in four rural basins located in hydrologically similar areas of neighboring counties. The basins ranged in size from 1.36 to 64.0 square miles. The U.S. Geological Survey Rainfall- Runoff Model (RRM) was calibrated for each of the urban basins. The calibrated models were used with long-term, historical rainfall and pan-evaporation data to simulate 79 years of annual peak-discharge data. Peak-discharge frequencies were estimated by fitting the logarithms of the annual peak discharges to a Pearson-Type III frequency distribution. The simulated peak-discharge frequencies were adjusted for improved reliability by application of bias-correction factors derived from peakdischarge frequencies based on local, observed annual peak discharges. The three-parameter and the preferred seven-parameter nationwide urban-peak-discharge regression equations previously developed by USGS investigators provided biased (high) estimates for the urban basins studied. Generalized-least-square regression procedures were used to relate peakdischarge frequency to selected basin characteristics. Regression equations were developed to estimate peak-discharge frequency by adjusting peak-dischargefrequency estimates made by use of the threeparameter nationwide urban regression equations. The regression equations are presented in equivalent forms as functions of contributing drainage area, main-channel slope, and basin development factor, which is an index for measuring the efficiency of the basin drainage system. Estimates of peak discharges for streams in the county can be made for the 2-, 5-, 10-, 25-, 50-, and 100-year recurrence intervals by use of the regression equations. The average standard errors of prediction of the regression equations ranges from ? 34 to ? 45 percent. The regression equations are applicable to ungaged streams in the county having a specific range of basin characteristics.

  20. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  1. [Atmospheric parameter estimation for LAMOST/GUOSHOUJING spectra].

    PubMed

    Lu, Yu; Li, Xiang-Ru; Yang, Tan

    2014-11-01

    It is a key task to estimate the atmospheric parameters from the observed stellar spectra in exploring the nature of stars and universe. With our Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) which begun its formal Sky Survey in September 2012, we are obtaining a mass of stellar spectra in an unprecedented speed. It has brought a new opportunity and a challenge for the research of galaxies. Due to the complexity of the observing system, the noise in the spectrum is relatively large. At the same time, the preprocessing procedures of spectrum are also not ideal, such as the wavelength calibration and the flow calibration. Therefore, there is a slight distortion of the spectrum. They result in the high difficulty of estimating the atmospheric parameters for the measured stellar spectra. It is one of the important issues to estimate the atmospheric parameters for the massive stellar spectra of LAMOST. The key of this study is how to eliminate noise and improve the accuracy and robustness of estimating the atmospheric parameters for the measured stellar spectra. We propose a regression model for estimating the atmospheric parameters of LAMOST stellar(SVM(lasso)). The basic idea of this model is: First, we use the Haar wavelet to filter spectrum, suppress the adverse effects of the spectral noise and retain the most discrimination information of spectrum. Secondly, We use the lasso algorithm for feature selection and extract the features of strongly correlating with the atmospheric parameters. Finally, the features are input to the support vector regression model for estimating the parameters. Because the model has better tolerance to the slight distortion and the noise of the spectrum, the accuracy of the measurement is improved. To evaluate the feasibility of the above scheme, we conduct experiments extensively on the 33 963 pilot surveys spectrums by LAMOST. The accuracy of three atmospheric parameters is log Teff: 0.006 8 dex, log g: 0.155 1 dex, [Fe/H]: 0.104 0 dex.

  2. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  3. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    NASA Astrophysics Data System (ADS)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  4. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  6. Parameter estimation in Cox models with missing failure indicators and the OPPERA study.

    PubMed

    Brownstein, Naomi C; Cai, Jianwen; Slade, Gary D; Bair, Eric

    2015-12-30

    In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the "gold standard" for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for incidence of TMD and perform the "gold standard" examination only on participants who screen positively. Unfortunately, some participants may leave the study before receiving the "gold standard" examination. Within the framework of survival analysis, this results in missing failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a method for parameter estimation in survival models with missing failure indicators. We estimate the probability of being an incident case for those lacking a "gold standard" examination using logistic regression. These estimated probabilities are used to generate multiple imputations of case status for each missing examination that are combined with observed data in appropriate regression models. The variance introduced by the procedure is estimated using multiple imputation. The method can be used to estimate both regression coefficients in Cox proportional hazard models as well as incidence rates using Poisson regression. We simulate data with missing failure indicators and show that our method performs as well as or better than competing methods. Finally, we apply the proposed method to data from the OPPERA study. Copyright © 2015 John Wiley & Sons, Ltd.

  7. Two biased estimation techniques in linear regression: Application to aircraft

    NASA Technical Reports Server (NTRS)

    Klein, Vladislav

    1988-01-01

    Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.

  8. Estimating Commute Distances of U.S. Army Reservists by Regional and Unit Characteristics

    DTIC Science & Technology

    1990-09-01

    multiple regression equation is used to estimate the parameters of the commute distance distribution as a function of reserve center and market ...used to estimate the parameters of the commute distance distribution as a function of reserve center and market characteristics. The results of the...recruiting personnel to meet unit fill rates. An important objective of the USAREC is to identify market areas that will support new reserve units [Ref. 2:p

  9. Semiparametric Estimation of the Impacts of Longitudinal Interventions on Adolescent Obesity using Targeted Maximum-Likelihood: Accessible Estimation with the ltmle Package

    PubMed Central

    Decker, Anna L.; Hubbard, Alan; Crespi, Catherine M.; Seto, Edmund Y.W.; Wang, May C.

    2015-01-01

    While child and adolescent obesity is a serious public health concern, few studies have utilized parameters based on the causal inference literature to examine the potential impacts of early intervention. The purpose of this analysis was to estimate the causal effects of early interventions to improve physical activity and diet during adolescence on body mass index (BMI), a measure of adiposity, using improved techniques. The most widespread statistical method in studies of child and adolescent obesity is multi-variable regression, with the parameter of interest being the coefficient on the variable of interest. This approach does not appropriately adjust for time-dependent confounding, and the modeling assumptions may not always be met. An alternative parameter to estimate is one motivated by the causal inference literature, which can be interpreted as the mean change in the outcome under interventions to set the exposure of interest. The underlying data-generating distribution, upon which the estimator is based, can be estimated via a parametric or semi-parametric approach. Using data from the National Heart, Lung, and Blood Institute Growth and Health Study, a 10-year prospective cohort study of adolescent girls, we estimated the longitudinal impact of physical activity and diet interventions on 10-year BMI z-scores via a parameter motivated by the causal inference literature, using both parametric and semi-parametric estimation approaches. The parameters of interest were estimated with a recently released R package, ltmle, for estimating means based upon general longitudinal treatment regimes. We found that early, sustained intervention on total calories had a greater impact than a physical activity intervention or non-sustained interventions. Multivariable linear regression yielded inflated effect estimates compared to estimates based on targeted maximum-likelihood estimation and data-adaptive super learning. Our analysis demonstrates that sophisticated, optimal semiparametric estimation of longitudinal treatment-specific means via ltmle provides an incredibly powerful, yet easy-to-use tool, removing impediments for putting theory into practice. PMID:26046009

  10. Stature Estimation from Lower Limb Anthropometry using Linear Regression Analysis: A Study on the Malaysian Population.

    PubMed

    Abu Bakar, S N; Aspalilah, A; AbdelNasser, I; Nurliza, A; Hairuliza, M J; Swarhib, M; Das, S; Mohd Nor, F

    2017-01-01

    Stature is one of the characteristics that could be used to identify human, besides age, sex and racial affiliation. This is useful when the body found is either dismembered, mutilated or even decomposed, and helps in narrowing down the missing person's identity. The main aim of the present study was to construct regression functions for stature estimation by using lower limb bones in the Malaysian population. The sample comprised 87 adult individuals (81 males, 6 females) aged between 20 to 79 years. The parameters such as thigh length, lower leg length, leg length, foot length, foot height and foot breadth were measured. They were measured by a ruler and measuring tape. Statistical analysis involved independent t-test to analyse the difference between lower limbs in male and female. The Pearson's correlation test was used to analyse correlations between lower limb parameters and stature, and the linear regressions were used to form equations. The paired t-test was used to compare between actual stature and estimated stature by using the equations formed. Using independent t-test, there was a significant difference (p< 0.05) in the measurement between males and females with regard to leg length, thigh length, lower leg length, foot length and foot breadth. The thigh length, leg length and foot length were observed to have strong correlations with stature with p= 0.75, p= 0.81 and p= 0.69, respectively. Linear regressions were formulated for stature estimation. Paired t-test showed no significant difference between actual stature and estimated stature. It is concluded that regression functions can be used to estimate stature to identify skeletal remains in the Malaysia population.

  11. Estimating riparian understory vegetation cover with beta regression and copula models

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Madsen, Lisa; Hagar, Joan C.; Temesgen, Hailemariam

    2011-01-01

    Understory vegetation communities are critical components of forest ecosystems. As a result, the importance of modeling understory vegetation characteristics in forested landscapes has become more apparent. Abundance measures such as shrub cover are bounded between 0 and 1, exhibit heteroscedastic error variance, and are often subject to spatial dependence. These distributional features tend to be ignored when shrub cover data are analyzed. The beta distribution has been used successfully to describe the frequency distribution of vegetation cover. Beta regression models ignoring spatial dependence (BR) and accounting for spatial dependence (BRdep) were used to estimate percent shrub cover as a function of topographic conditions and overstory vegetation structure in riparian zones in western Oregon. The BR models showed poor explanatory power (pseudo-R2 ≤ 0.34) but outperformed ordinary least-squares (OLS) and generalized least-squares (GLS) regression models with logit-transformed response in terms of mean square prediction error and absolute bias. We introduce a copula (COP) model that is based on the beta distribution and accounts for spatial dependence. A simulation study was designed to illustrate the effects of incorrectly assuming normality, equal variance, and spatial independence. It showed that BR, BRdep, and COP models provide unbiased parameter estimates, whereas OLS and GLS models result in slightly biased estimates for two of the three parameters. On the basis of the simulation study, 93–97% of the GLS, BRdep, and COP confidence intervals covered the true parameters, whereas OLS and BR only resulted in 84–88% coverage, which demonstrated the superiority of GLS, BRdep, and COP over OLS and BR models in providing standard errors for the parameter estimates in the presence of spatial dependence.

  12. A fuzzy adaptive network approach to parameter estimation in cases where independent variables come from an exponential distribution

    NASA Astrophysics Data System (ADS)

    Dalkilic, Turkan Erbay; Apaydin, Aysen

    2009-11-01

    In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.

  13. Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales

    NASA Astrophysics Data System (ADS)

    Kristoufek, Ladislav

    2015-02-01

    We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.

  14. Matched samples logistic regression in case-control studies with missing values: when to break the matches.

    PubMed

    Hansson, Lisbeth; Khamis, Harry J

    2008-12-01

    Simulated data sets are used to evaluate conditional and unconditional maximum likelihood estimation in an individual case-control design with continuous covariates when there are different rates of excluded cases and different levels of other design parameters. The effectiveness of the estimation procedures is measured by method bias, variance of the estimators, root mean square error (RMSE) for logistic regression and the percentage of explained variation. Conditional estimation leads to higher RMSE than unconditional estimation in the presence of missing observations, especially for 1:1 matching. The RMSE is higher for the smaller stratum size, especially for the 1:1 matching. The percentage of explained variation appears to be insensitive to missing data, but is generally higher for the conditional estimation than for the unconditional estimation. It is particularly good for the 1:2 matching design. For minimizing RMSE, a high matching ratio is recommended; in this case, conditional and unconditional logistic regression models yield comparable levels of effectiveness. For maximizing the percentage of explained variation, the 1:2 matching design with the conditional logistic regression model is recommended.

  15. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  16. Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.

    PubMed

    Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih

    2016-10-01

    In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.

  17. Implementations of geographically weighted lasso in spatial data with multicollinearity (Case study: Poverty modeling of Java Island)

    NASA Astrophysics Data System (ADS)

    Setiyorini, Anis; Suprijadi, Jadi; Handoko, Budhi

    2017-03-01

    Geographically Weighted Regression (GWR) is a regression model that takes into account the spatial heterogeneity effect. In the application of the GWR, inference on regression coefficients is often of interest, as is estimation and prediction of the response variable. Empirical research and studies have demonstrated that local correlation between explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated, a condition named multicollinearity. It later results on a large standard error on estimated regression coefficients, and, hence, problematic for inference on relationships between variables. Geographically Weighted Lasso (GWL) is a method which capable to deal with spatial heterogeneity and local multicollinearity in spatial data sets. GWL is a further development of GWR method, which adds a LASSO (Least Absolute Shrinkage and Selection Operator) constraint in parameter estimation. In this study, GWL will be applied by using fixed exponential kernel weights matrix to establish a poverty modeling of Java Island, Indonesia. The results of applying the GWL to poverty datasets show that this method stabilizes regression coefficients in the presence of multicollinearity and produces lower prediction and estimation error of the response variable than GWR does.

  18. Techniques for estimating peak-streamflow frequency for unregulated streams and streams regulated by small floodwater retarding structures in Oklahoma

    USGS Publications Warehouse

    Tortorelli, Robert L.

    1997-01-01

    Statewide regression equations for Oklahoma were determined for estimating peak discharge and flood frequency for selected recurrence intervals from 2 to 500 years for ungaged sites on natural unregulated streams. The most significant independent variables required to estimate peak-streamflow frequency for natural unregulated streams in Oklahoma are contributing drainage area, main-channel slope, and mean-annual precipitation. The regression equations are applicable for watersheds with drainage areas less than 2,510 square miles that are not affected by regulation from manmade works. Limitations on the use of the regression relations and the reliability of regression estimates for natural unregulated streams are discussed. Log-Pearson Type III analysis information, basin and climatic characteristics, and the peak-stream-flow frequency estimates for 251 gaging stations in Oklahoma and adjacent states are listed. Techniques are presented to make a peak-streamflow frequency estimate for gaged sites on natural unregulated streams and to use this result to estimate a nearby ungaged site on the same stream. For ungaged sites on urban streams, an adjustment of the statewide regression equations for natural unregulated streams can be used to estimate peak-streamflow frequency. For ungaged sites on streams regulated by small floodwater retarding structures, an adjustment of the statewide regression equations for natural unregulated streams can be used to estimate peak-streamflow frequency. The statewide regression equations are adjusted by substituting the drainage area below the floodwater retarding structures, or drainage area that represents the percentage of the unregulated basin, in the contributing drainage area parameter to obtain peak-streamflow frequency estimates.

  19. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  20. Runoff load estimation of particulate and dissolved nitrogen in Lake Inba watershed using continuous monitoring data on turbidity and electric conductivity.

    PubMed

    Kim, J; Nagano, Y; Furumai, H

    2012-01-01

    Easy-to-measure surrogate parameters for water quality indicators are needed for real time monitoring as well as for generating data for model calibration and validation. In this study, a novel linear regression model for estimating total nitrogen (TN) based on two surrogate parameters is proposed based on evaluation of pollutant loads flowing into a eutrophic lake. Based on their runoff characteristics during wet weather, electric conductivity (EC) and turbidity were selected as surrogates for particulate nitrogen (PN) and dissolved nitrogen (DN), respectively. Strong linear relationships were established between PN and turbidity and DN and EC, and both models subsequently combined for estimation of TN. This model was evaluated by comparison of estimated and observed TN runoff loads during rainfall events. This analysis showed that turbidity and EC are viable surrogates for PN and DN, respectively, and that the linear regression model for TN concentration was successful in estimating TN runoff loads during rainfall events and also under dry weather conditions.

  1. Gaussian Process Regression Model in Spatial Logistic Regression

    NASA Astrophysics Data System (ADS)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  2. Semiparametric temporal process regression of survival-out-of-hospital.

    PubMed

    Zhan, Tianyu; Schaubel, Douglas E

    2018-05-23

    The recurrent/terminal event data structure has undergone considerable methodological development in the last 10-15 years. An example of the data structure that has arisen with increasing frequency involves the recurrent event being hospitalization and the terminal event being death. We consider the response Survival-Out-of-Hospital, defined as a temporal process (indicator function) taking the value 1 when the subject is currently alive and not hospitalized, and 0 otherwise. Survival-Out-of-Hospital is a useful alternative strategy for the analysis of hospitalization/survival in the chronic disease setting, with the response variate representing a refinement to survival time through the incorporation of an objective quality-of-life component. The semiparametric model we consider assumes multiplicative covariate effects and leaves unspecified the baseline probability of being alive-and-out-of-hospital. Using zero-mean estimating equations, the proposed regression parameter estimator can be computed without estimating the unspecified baseline probability process, although baseline probabilities can subsequently be estimated for any time point within the support of the censoring distribution. We demonstrate that the regression parameter estimator is asymptotically normal, and that the baseline probability function estimator converges to a Gaussian process. Simulation studies are performed to show that our estimating procedures have satisfactory finite sample performances. The proposed methods are applied to the Dialysis Outcomes and Practice Patterns Study (DOPPS), an international end-stage renal disease study.

  3. Mixed geographically weighted regression (MGWR) model with weighted adaptive bi-square for case of dengue hemorrhagic fever (DHF) in Surakarta

    NASA Astrophysics Data System (ADS)

    Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.

    2017-06-01

    MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.

  4. Estimation of distributional parameters for censored trace level water quality data: 2. Verification and applications

    USGS Publications Warehouse

    Helsel, Dennis R.; Gilliom, Robert J.

    1986-01-01

    Estimates of distributional parameters (mean, standard deviation, median, interquartile range) are often desired for data sets containing censored observations. Eight methods for estimating these parameters have been evaluated by R. J. Gilliom and D. R. Helsel (this issue) using Monte Carlo simulations. To verify those findings, the same methods are now applied to actual water quality data. The best method (lowest root-mean-squared error (rmse)) over all parameters, sample sizes, and censoring levels is log probability regression (LR), the method found best in the Monte Carlo simulations. Best methods for estimating moment or percentile parameters separately are also identical to the simulations. Reliability of these estimates can be expressed as confidence intervals using rmse and bias values taken from the simulation results. Finally, a new simulation study shows that best methods for estimating uncensored sample statistics from censored data sets are identical to those for estimating population parameters. Thus this study and the companion study by Gilliom and Helsel form the basis for making the best possible estimates of either population parameters or sample statistics from censored water quality data, and for assessments of their reliability.

  5. Estimation of Compaction Parameters Based on Soil Classification

    NASA Astrophysics Data System (ADS)

    Lubis, A. S.; Muis, Z. A.; Hastuty, I. P.; Siregar, I. M.

    2018-02-01

    Factors that must be considered in compaction of the soil works were the type of soil material, field control, maintenance and availability of funds. Those problems then raised the idea of how to estimate the density of the soil with a proper implementation system, fast, and economical. This study aims to estimate the compaction parameter i.e. the maximum dry unit weight (γ dmax) and optimum water content (Wopt) based on soil classification. Each of 30 samples were being tested for its properties index and compaction test. All of the data’s from the laboratory test results, were used to estimate the compaction parameter values by using linear regression and Goswami Model. From the research result, the soil types were A4, A-6, and A-7 according to AASHTO and SC, SC-SM, and CL based on USCS. By linear regression, the equation for estimation of the maximum dry unit weight (γdmax *)=1,862-0,005*FINES- 0,003*LL and estimation of the optimum water content (wopt *)=- 0,607+0,362*FINES+0,161*LL. By Goswami Model (with equation Y=mLogG+k), for estimation of the maximum dry unit weight (γdmax *) with m=-0,376 and k=2,482, for estimation of the optimum water content (wopt *) with m=21,265 and k=-32,421. For both of these equations a 95% confidence interval was obtained.

  6. Estimation of diffusion coefficients from voltammetric signals by support vector and gaussian process regression

    PubMed Central

    2014-01-01

    Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463

  7. The application of parameter estimation to flight measurements to obtain lateral-directional stability derivatives of an augmented jet-flap STOL airplane

    NASA Technical Reports Server (NTRS)

    Stephenson, J. D.

    1983-01-01

    Flight experiments with an augmented jet flap STOL aircraft provided data from which the lateral directional stability and control derivatives were calculated by applying a linear regression parameter estimation procedure. The tests, which were conducted with the jet flaps set at a 65 deg deflection, covered a large range of angles of attack and engine power settings. The effect of changing the angle of the jet thrust vector was also investigated. Test results are compared with stability derivatives that had been predicted. The roll damping derived from the tests was significantly larger than had been predicted, whereas the other derivatives were generally in agreement with the predictions. Results obtained using a maximum likelihood estimation procedure are compared with those from the linear regression solutions.

  8. An application of robust ridge regression model in the presence of outliers to real data problem

    NASA Astrophysics Data System (ADS)

    Shariff, N. S. Md.; Ferdaos, N. A.

    2017-09-01

    Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.

  9. Marginal regression approach for additive hazards models with clustered current status data.

    PubMed

    Su, Pei-Fang; Chi, Yunchan

    2014-01-15

    Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.

  10. Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.

    PubMed

    Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il

    2016-01-01

    This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.

  11. Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages

    PubMed Central

    Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang

    2016-01-01

    This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668

  12. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations. Research Report. ETS RR-09-40

    ERIC Educational Resources Information Center

    Haberman, Shelby J.

    2009-01-01

    A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a…

  13. A novel method of estimation of lipophilicity using distance-based topological indices: dominating role of equalized electronegativity.

    PubMed

    Agrawal, Vijay K; Gupta, Madhu; Singh, Jyoti; Khadikar, Padmakar V

    2005-03-15

    Attempt is made to propose yet another method of estimating lipophilicity of a heterogeneous set of 223 compounds. The method is based on the use of equalized electronegativity along with topological indices. It was observed that excellent results are obtained in multiparametric regression upon introduction of indicator parameters. The results are discussed critically on the basis various statistical parameters.

  14. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis

    PubMed Central

    Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma

    2016-01-01

    Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666

  15. Addressing data privacy in matched studies via virtual pooling.

    PubMed

    Saha-Chaudhuri, P; Weinberg, C R

    2017-09-07

    Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.

  16. Satellite rainfall retrieval by logistic regression

    NASA Technical Reports Server (NTRS)

    Chiu, Long S.

    1986-01-01

    The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.

  17. Multivariate random-parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections.

    PubMed

    Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan

    2014-09-01

    Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Spatial distribution of water supply in the coterminous United States

    Treesearch

    Thomas C. Brown; Michael T. Hobbins; Jorge A. Ramirez

    2008-01-01

    Available water supply across the contiguous 48 states was estimated as precipitation minus evapotranspiration using data for the period 1953-1994. Precipitation estimates were taken from the Parameter- Elevation Regressions on Independent Slopes Model (PRISM). Evapotranspiration was estimated using two models, the Advection-Aridity model and the Zhang model. The...

  19. A computational model for biosonar echoes from foliage

    PubMed Central

    Gupta, Anupam Kumar; Lu, Ruijin; Zhu, Hongxiao

    2017-01-01

    Since many bat species thrive in densely vegetated habitats, echoes from foliage are likely to be of prime importance to the animals’ sensory ecology, be it as clutter that masks prey echoes or as sources of information about the environment. To better understand the characteristics of foliage echoes, a new model for the process that generates these signals has been developed. This model takes leaf size and orientation into account by representing the leaves as circular disks of varying diameter. The two added leaf parameters are of potential importance to the sensory ecology of bats, e.g., with respect to landmark recognition and flight guidance along vegetation contours. The full model is specified by a total of three parameters: leaf density, average leaf size, and average leaf orientation. It assumes that all leaf parameters are independently and identically distributed. Leaf positions were drawn from a uniform probability density function, sizes and orientations each from a Gaussian probability function. The model was found to reproduce the first-order amplitude statistics of measured example echoes and showed time-variant echo properties that depended on foliage parameters. Parameter estimation experiments using lasso regression have demonstrated that a single foliage parameter can be estimated with high accuracy if the other two parameters are known a priori. If only one parameter is known a priori, the other two can still be estimated, but with a reduced accuracy. Lasso regression did not support simultaneous estimation of all three parameters. Nevertheless, these results demonstrate that foliage echoes contain accessible information on foliage type and orientation that could play a role in supporting sensory tasks such as landmark identification and contour following in echolocating bats. PMID:28817631

  20. A computational model for biosonar echoes from foliage.

    PubMed

    Ming, Chen; Gupta, Anupam Kumar; Lu, Ruijin; Zhu, Hongxiao; Müller, Rolf

    2017-01-01

    Since many bat species thrive in densely vegetated habitats, echoes from foliage are likely to be of prime importance to the animals' sensory ecology, be it as clutter that masks prey echoes or as sources of information about the environment. To better understand the characteristics of foliage echoes, a new model for the process that generates these signals has been developed. This model takes leaf size and orientation into account by representing the leaves as circular disks of varying diameter. The two added leaf parameters are of potential importance to the sensory ecology of bats, e.g., with respect to landmark recognition and flight guidance along vegetation contours. The full model is specified by a total of three parameters: leaf density, average leaf size, and average leaf orientation. It assumes that all leaf parameters are independently and identically distributed. Leaf positions were drawn from a uniform probability density function, sizes and orientations each from a Gaussian probability function. The model was found to reproduce the first-order amplitude statistics of measured example echoes and showed time-variant echo properties that depended on foliage parameters. Parameter estimation experiments using lasso regression have demonstrated that a single foliage parameter can be estimated with high accuracy if the other two parameters are known a priori. If only one parameter is known a priori, the other two can still be estimated, but with a reduced accuracy. Lasso regression did not support simultaneous estimation of all three parameters. Nevertheless, these results demonstrate that foliage echoes contain accessible information on foliage type and orientation that could play a role in supporting sensory tasks such as landmark identification and contour following in echolocating bats.

  1. Methods for estimating confidence intervals in interrupted time series analyses of health interventions.

    PubMed

    Zhang, Fang; Wagner, Anita K; Soumerai, Stephen B; Ross-Degnan, Dennis

    2009-02-01

    Interrupted time series (ITS) is a strong quasi-experimental research design, which is increasingly applied to estimate the effects of health services and policy interventions. We describe and illustrate two methods for estimating confidence intervals (CIs) around absolute and relative changes in outcomes calculated from segmented regression parameter estimates. We used multivariate delta and bootstrapping methods (BMs) to construct CIs around relative changes in level and trend, and around absolute changes in outcome based on segmented linear regression analyses of time series data corrected for autocorrelated errors. Using previously published time series data, we estimated CIs around the effect of prescription alerts for interacting medications with warfarin on the rate of prescriptions per 10,000 warfarin users per month. Both the multivariate delta method (MDM) and the BM produced similar results. BM is preferred for calculating CIs of relative changes in outcomes of time series studies, because it does not require large sample sizes when parameter estimates are obtained correctly from the model. Caution is needed when sample size is small.

  2. Using Bayesian regression to test hypotheses about relationships between parameters and covariates in cognitive models.

    PubMed

    Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan

    2018-06-01

    An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.

  3. Hydrology and trout populations of cold-water rivers of Michigan and Wisconsin

    USGS Publications Warehouse

    Hendrickson, G.E.; Knutilla, R.L.

    1974-01-01

    Statistical multiple-regression analyses showed significant relationships between trout populations and hydrologic parameters. Parameters showing the higher levels of significance were temperature, hardness of water, percentage of gravel bottom, percentage of bottom vegetation, variability of streamflow, and discharge per unit drainage area. Trout populations increase with lower levels of annual maximum water temperatures, with increase in water hardness, and with increase in percentage of gravel and bottom vegetation. Trout populations also increase with decrease in variability of streamflow, and with increase in discharge per unit drainage area. Most hydrologic parameters were significant when evaluated collectively, but no parameter, by itself, showed a high degree of correlation with trout populations in regression analyses that included all the streams sampled. Regression analyses of stream segments that were restricted to certain limits of hardness, temperature, or percentage of gravel bottom showed improvements in correlation. Analyses of trout populations, in pounds per acre and pounds per mile and hydrologic parameters resulted in regression equations from which trout populations could be estimated with standard errors of 89 and 84 per cent, respectively.

  4. Estimation of genetic parameters related to eggshell strength using random regression models.

    PubMed

    Guo, J; Ma, M; Qu, L; Shen, M; Dou, T; Wang, K

    2015-01-01

    This study examined the changes in eggshell strength and the genetic parameters related to this trait throughout a hen's laying life using random regression. The data were collected from a crossbred population between 2011 and 2014, where the eggshell strength was determined repeatedly for 2260 hens. Using random regression models (RRMs), several Legendre polynomials were employed to estimate the fixed, direct genetic and permanent environment effects. The residual effects were treated as independently distributed with heterogeneous variance for each test week. The direct genetic variance was included with second-order Legendre polynomials and the permanent environment with third-order Legendre polynomials. The heritability of eggshell strength ranged from 0.26 to 0.43, the repeatability ranged between 0.47 and 0.69, and the estimated genetic correlations between test weeks was high at > 0.67. The first eigenvalue of the genetic covariance matrix accounted for about 97% of the sum of all the eigenvalues. The flexibility and statistical power of RRM suggest that this model could be an effective method to improve eggshell quality and to reduce losses due to cracked eggs in a breeding plan.

  5. Linear regression metamodeling as a tool to summarize and present simulation model results.

    PubMed

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  6. Effects of wing modification on an aircraft's aerodynamic parameters as determined from flight data

    NASA Technical Reports Server (NTRS)

    Hess, R. A.

    1986-01-01

    A study of the effects of four wing-leading-edge modifications on a general aviation aircraft's stability and control parameters is presented. Flight data from the basic aircraft configuration and configurations with wing modifications are analyzed to determine each wing geometry's stability and control parameters. The parameter estimates and aerodynamic model forms are obtained using the stepwise regression and maximum likelihood techniques. The resulting parameter estimates and aerodynamic models are verified using vortex-lattice theory and by analysis of each model's ability to predict aircraft behavior. Comparisons of the stability and control derivative estimates from the basic wing and the four leading-edge modifications are accomplished so that the effects of each modification on aircraft stability and control derivatives can be determined.

  7. Parameter estimation of Monod model by the Least-Squares method for microalgae Botryococcus Braunii sp

    NASA Astrophysics Data System (ADS)

    See, J. J.; Jamaian, S. S.; Salleh, R. M.; Nor, M. E.; Aman, F.

    2018-04-01

    This research aims to estimate the parameters of Monod model of microalgae Botryococcus Braunii sp growth by the Least-Squares method. Monod equation is a non-linear equation which can be transformed into a linear equation form and it is solved by implementing the Least-Squares linear regression method. Meanwhile, Gauss-Newton method is an alternative method to solve the non-linear Least-Squares problem with the aim to obtain the parameters value of Monod model by minimizing the sum of square error ( SSE). As the result, the parameters of the Monod model for microalgae Botryococcus Braunii sp can be estimated by the Least-Squares method. However, the estimated parameters value obtained by the non-linear Least-Squares method are more accurate compared to the linear Least-Squares method since the SSE of the non-linear Least-Squares method is less than the linear Least-Squares method.

  8. Robust support vector regression networks for function approximation with outliers.

    PubMed

    Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching

    2002-01-01

    Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.

  9. Regression equations for estimation of annual peak-streamflow frequency for undeveloped watersheds in Texas using an L-moment-based, PRESS-minimized, residual-adjusted approach

    USGS Publications Warehouse

    Asquith, William H.; Roussel, Meghan C.

    2009-01-01

    Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed characteristics of drainage area, dimensionless main-channel slope, and mean annual precipitation. The residuals of the nine equations are spatially mapped, and residuals for the 10-year recurrence interval are selected for generalization to 1-degree latitude and longitude quadrangles. The generalized residual is referred to as the OmegaEM parameter and represents a generalized terrain and climate index that expresses peak-streamflow potential not otherwise represented in the three watershed characteristics. The OmegaEM parameter was assigned to each station, and using OmegaEM, nine additional regression equations are computed. Because of favorable diagnostics, the OmegaEM equations are expected to be generally reliable estimators of peak-streamflow frequency for undeveloped and ungaged stream locations in Texas. The mean residual standard error, adjusted R-squared, and percentage reduction of PRESS by use of OmegaEM are 0.30log10, 0.86, and -21 percent, respectively. Inclusion of the OmegaEM parameter provides a substantial reduction in the PRESS statistic of the regression equations and removes considerable spatial dependency in regression residuals. Although the OmegaEM parameter requires interpretation on the part of analysts and the potential exists that different analysts could estimate different values for a given watershed, the authors suggest that typical uncertainty in the OmegaEM estimate might be about +or-0.1010. Finally, given the two ensembles of equations reported herein and those in previous reports, hydrologic design engineers and other analysts have several different methods, which represent different analytical tracks, to make comparisons of peak-streamflow frequency estimates for ungaged watersheds in the study area.

  10. Evaluating Machine Learning Regression Algorithms for Operational Retrieval of Biophysical Parameters: Opportunities for Sentinel

    NASA Astrophysics Data System (ADS)

    Verrelst, Jochem; Rivera, J. P.; Alonso, L.; Guanter, L.; Moreno, J.

    2012-04-01

    ESA’s upcoming satellites Sentinel-2 (S2) and Sentinel-3 (S3) aim to ensure continuity for Landsat 5/7, SPOT- 5, SPOT-Vegetation and Envisat MERIS observations by providing superspectral images of high spatial and temporal resolution. S2 and S3 will deliver near real-time operational products with a high accuracy for land monitoring. This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. Machine learning regression algorithms could be powerful candidates for the estimation of biophysical parameters from satellite reflectance measurements because of their ability to perform adaptive, nonlinear data fitting. By using data from the ESA-led field campaign SPARC (Barrax, Spain), it was recently found [1] that Gaussian processes regression (GPR) outperformed competitive machine learning algorithms such as neural networks, support vector regression) and kernel ridge regression both in terms of accuracy and computational speed. For various Sentinel configurations (S2-10m, S2- 20m, S2-60m and S3-300m) three important biophysical parameters were estimated: leaf chlorophyll content (Chl), leaf area index (LAI) and fractional vegetation cover (FVC). GPR was the only method that reached the 10% precision required by end users in the estimation of Chl. In view of implementing the regressor into operational monitoring applications, here the portability of locally trained GPR models to other images was evaluated. The associated confidence maps proved to be a good indicator for evaluating the robustness of the trained models. Consistent retrievals were obtained across the different images, particularly over agricultural sites. To make the method suitable for operational use, however, the poorer confidences over bare soil areas suggest that the training dataset should be expanded with inputs from various land cover types.

  11. Hyperspectral signature analysis of skin parameters

    NASA Astrophysics Data System (ADS)

    Vyas, Saurabh; Banerjee, Amit; Garza, Luis; Kang, Sewon; Burlina, Philippe

    2013-02-01

    The temporal analysis of changes in biological skin parameters, including melanosome concentration, collagen concentration and blood oxygenation, may serve as a valuable tool in diagnosing the progression of malignant skin cancers and in understanding the pathophysiology of cancerous tumors. Quantitative knowledge of these parameters can also be useful in applications such as wound assessment, and point-of-care diagnostics, amongst others. We propose an approach to estimate in vivo skin parameters using a forward computational model based on Kubelka-Munk theory and the Fresnel Equations. We use this model to map the skin parameters to their corresponding hyperspectral signature. We then use machine learning based regression to develop an inverse map from hyperspectral signatures to skin parameters. In particular, we employ support vector machine based regression to estimate the in vivo skin parameters given their corresponding hyperspectral signature. We build on our work from SPIE 2012, and validate our methodology on an in vivo dataset. This dataset consists of 241 signatures collected from in vivo hyperspectral imaging of patients of both genders and Caucasian, Asian and African American ethnicities. In addition, we also extend our methodology past the visible region and through the short-wave infrared region of the electromagnetic spectrum. We find promising results when comparing the estimated skin parameters to the ground truth, demonstrating good agreement with well-established physiological precepts. This methodology can have potential use in non-invasive skin anomaly detection and for developing minimally invasive pre-screening tools.

  12. Estimating sunspot number

    NASA Technical Reports Server (NTRS)

    Wilson, R. M.; Reichmann, E. J.; Teuber, D. L.

    1984-01-01

    An empirical method is developed to predict certain parameters of future solar activity cycles. Sunspot cycle statistics are examined, and curve fitting and linear regression analysis techniques are utilized.

  13. Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions

    PubMed Central

    Fernandes, Bruno J. T.; Roque, Alexandre

    2018-01-01

    Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366

  14. Physiologic noise regression, motion regression, and TOAST dynamic field correction in complex-valued fMRI time series.

    PubMed

    Hahn, Andrew D; Rowe, Daniel B

    2012-02-01

    As more evidence is presented suggesting that the phase, as well as the magnitude, of functional MRI (fMRI) time series may contain important information and that there are theoretical drawbacks to modeling functional response in the magnitude alone, removing noise in the phase is becoming more important. Previous studies have shown that retrospective correction of noise from physiologic sources can remove significant phase variance and that dynamic main magnetic field correction and regression of estimated motion parameters also remove significant phase fluctuations. In this work, we investigate the performance of physiologic noise regression in a framework along with correction for dynamic main field fluctuations and motion regression. Our findings suggest that including physiologic regressors provides some benefit in terms of reduction in phase noise power, but it is small compared to the benefit of dynamic field corrections and use of estimated motion parameters as nuisance regressors. Additionally, we show that the use of all three techniques reduces phase variance substantially, removes undesirable spatial phase correlations and improves detection of the functional response in magnitude and phase. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Genetic parameters for test day milk yields of first lactation Holstein cows by random regression models.

    PubMed

    de Melo, C M R; Packer, I U; Costa, C N; Machado, P F

    2007-03-01

    Covariance components for test day milk yield using 263 390 first lactation records of 32 448 Holstein cows were estimated using random regression animal models by restricted maximum likelihood. Three functions were used to adjust the lactation curve: the five-parameter logarithmic Ali and Schaeffer function (AS), the three-parameter exponential Wilmink function in its standard form (W) and in a modified form (W*), by reducing the range of covariate, and the combination of Legendre polynomial and W (LEG+W). Heterogeneous residual variance (RV) for different classes (4 and 29) of days in milk was considered in adjusting the functions. Estimates of RV were quite similar, rating from 4.15 to 5.29 kg2. Heritability estimates for AS (0.29 to 0.42), LEG+W (0.28 to 0.42) and W* (0.33 to 0.40) were similar, but heritability estimates used W (0.25 to 0.65) were highest than those estimated by the other functions, particularly at the end of lactation. Genetic correlations between milk yield on consecutive test days were close to unity, but decreased as the interval between test days increased. The AS function with homogeneous RV model had the best fit among those evaluated.

  16. Kinetic parameter estimation model for anaerobic co-digestion of waste activated sludge and microalgae.

    PubMed

    Lee, Eunyoung; Cumberbatch, Jewel; Wang, Meng; Zhang, Qiong

    2017-03-01

    Anaerobic co-digestion has a potential to improve biogas production, but limited kinetic information is available for co-digestion. This study introduced regression-based models to estimate the kinetic parameters for the co-digestion of microalgae and Waste Activated Sludge (WAS). The models were developed using the ratios of co-substrates and the kinetic parameters for the single substrate as indicators. The models were applied to the modified first-order kinetics and Monod model to determine the rate of hydrolysis and methanogenesis for the co-digestion. The results showed that the model using a hyperbola function was better for the estimation of the first-order kinetic coefficients, while the model using inverse tangent function closely estimated the Monod kinetic parameters. The models can be used for estimating kinetic parameters for not only microalgae-WAS co-digestion but also other substrates' co-digestion such as microalgae-swine manure and WAS-aquatic plants. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. One-step global parameter estimation of kinetic inactivation parameters for Bacillus sporothermodurans spores under static and dynamic thermal processes.

    PubMed

    Cattani, F; Dolan, K D; Oliveira, S D; Mishra, D K; Ferreira, C A S; Periago, P M; Aznar, A; Fernandez, P S; Valdramidis, V P

    2016-11-01

    Bacillus sporothermodurans produces highly heat-resistant endospores, that can survive under ultra-high temperature. High heat-resistant sporeforming bacteria are one of the main causes for spoilage and safety of low-acid foods. They can be used as indicators or surrogates to establish the minimum requirements for heat processes, but it is necessary to understand their thermal inactivation kinetics. The aim of the present work was to study the inactivation kinetics under both static and dynamic conditions in a vegetable soup. Ordinary least squares one-step regression and sequential procedures were applied for estimating these parameters. Results showed that multiple dynamic heating profiles, when analyzed simultaneously, can be used to accurately estimate the kinetic parameters while significantly reducing estimation errors and data collection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. A Spline Regression Model for Latent Variables

    ERIC Educational Resources Information Center

    Harring, Jeffrey R.

    2014-01-01

    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  19. Numerically accurate computational techniques for optimal estimator analyses of multi-parameter models

    NASA Astrophysics Data System (ADS)

    Berger, Lukas; Kleinheinz, Konstantin; Attili, Antonio; Bisetti, Fabrizio; Pitsch, Heinz; Mueller, Michael E.

    2018-05-01

    Modelling unclosed terms in partial differential equations typically involves two steps: First, a set of known quantities needs to be specified as input parameters for a model, and second, a specific functional form needs to be defined to model the unclosed terms by the input parameters. Both steps involve a certain modelling error, with the former known as the irreducible error and the latter referred to as the functional error. Typically, only the total modelling error, which is the sum of functional and irreducible error, is assessed, but the concept of the optimal estimator enables the separate analysis of the total and the irreducible errors, yielding a systematic modelling error decomposition. In this work, attention is paid to the techniques themselves required for the practical computation of irreducible errors. Typically, histograms are used for optimal estimator analyses, but this technique is found to add a non-negligible spurious contribution to the irreducible error if models with multiple input parameters are assessed. Thus, the error decomposition of an optimal estimator analysis becomes inaccurate, and misleading conclusions concerning modelling errors may be drawn. In this work, numerically accurate techniques for optimal estimator analyses are identified and a suitable evaluation of irreducible errors is presented. Four different computational techniques are considered: a histogram technique, artificial neural networks, multivariate adaptive regression splines, and an additive model based on a kernel method. For multiple input parameter models, only artificial neural networks and multivariate adaptive regression splines are found to yield satisfactorily accurate results. Beyond a certain number of input parameters, the assessment of models in an optimal estimator analysis even becomes practically infeasible if histograms are used. The optimal estimator analysis in this paper is applied to modelling the filtered soot intermittency in large eddy simulations using a dataset of a direct numerical simulation of a non-premixed sooting turbulent flame.

  20. A novel Gaussian process regression model for state-of-health estimation of lithium-ion battery using charging curve

    NASA Astrophysics Data System (ADS)

    Yang, Duo; Zhang, Xu; Pan, Rui; Wang, Yujie; Chen, Zonghai

    2018-04-01

    The state-of-health (SOH) estimation is always a crucial issue for lithium-ion batteries. In order to provide an accurate and reliable SOH estimation, a novel Gaussian process regression (GPR) model based on charging curve is proposed in this paper. Different from other researches where SOH is commonly estimated by cycle life, in this work four specific parameters extracted from charging curves are used as inputs of the GPR model instead of cycle numbers. These parameters can reflect the battery aging phenomenon from different angles. The grey relational analysis method is applied to analyze the relational grade between selected features and SOH. On the other hand, some adjustments are made in the proposed GPR model. Covariance function design and the similarity measurement of input variables are modified so as to improve the SOH estimate accuracy and adapt to the case of multidimensional input. Several aging data from NASA data repository are used for demonstrating the estimation effect by the proposed method. Results show that the proposed method has high SOH estimation accuracy. Besides, a battery with dynamic discharging profile is used to verify the robustness and reliability of this method.

  1. Development of an anaerobic threshold (HRLT, HRVT) estimation equation using the heart rate threshold (HRT) during the treadmill incremental exercise test

    PubMed Central

    Ham, Joo-ho; Park, Hun-Young; Kim, Youn-ho; Bae, Sang-kon; Ko, Byung-hoon

    2017-01-01

    [Purpose] The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. [Methods] We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20–59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. [Results] Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. [Conclusion] These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. PMID:29036765

  2. Development of an anaerobic threshold (HRLT, HRVT) estimation equation using the heart rate threshold (HRT) during the treadmill incremental exercise test.

    PubMed

    Ham, Joo-Ho; Park, Hun-Young; Kim, Youn-Ho; Bae, Sang-Kon; Ko, Byung-Hoon; Nam, Sang-Seok

    2017-09-30

    The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20-59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. ©2017 The Korean Society for Exercise Nutrition

  3. Detecting isotopic ratio outliers

    NASA Astrophysics Data System (ADS)

    Bayne, C. K.; Smith, D. H.

    An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.

  4. Developing a Novel Parameter Estimation Method for Agent-Based Model in Immune System Simulation under the Framework of History Matching: A Case Study on Influenza A Virus Infection

    PubMed Central

    Li, Tingting; Cheng, Zhengguo; Zhang, Le

    2017-01-01

    Since they can provide a natural and flexible description of nonlinear dynamic behavior of complex system, Agent-based models (ABM) have been commonly used for immune system simulation. However, it is crucial for ABM to obtain an appropriate estimation for the key parameters of the model by incorporating experimental data. In this paper, a systematic procedure for immune system simulation by integrating the ABM and regression method under the framework of history matching is developed. A novel parameter estimation method by incorporating the experiment data for the simulator ABM during the procedure is proposed. First, we employ ABM as simulator to simulate the immune system. Then, the dimension-reduced type generalized additive model (GAM) is employed to train a statistical regression model by using the input and output data of ABM and play a role as an emulator during history matching. Next, we reduce the input space of parameters by introducing an implausible measure to discard the implausible input values. At last, the estimation of model parameters is obtained using the particle swarm optimization algorithm (PSO) by fitting the experiment data among the non-implausible input values. The real Influeza A Virus (IAV) data set is employed to demonstrate the performance of our proposed method, and the results show that the proposed method not only has good fitting and predicting accuracy, but it also owns favorable computational efficiency. PMID:29194393

  5. Developing a Novel Parameter Estimation Method for Agent-Based Model in Immune System Simulation under the Framework of History Matching: A Case Study on Influenza A Virus Infection.

    PubMed

    Li, Tingting; Cheng, Zhengguo; Zhang, Le

    2017-12-01

    Since they can provide a natural and flexible description of nonlinear dynamic behavior of complex system, Agent-based models (ABM) have been commonly used for immune system simulation. However, it is crucial for ABM to obtain an appropriate estimation for the key parameters of the model by incorporating experimental data. In this paper, a systematic procedure for immune system simulation by integrating the ABM and regression method under the framework of history matching is developed. A novel parameter estimation method by incorporating the experiment data for the simulator ABM during the procedure is proposed. First, we employ ABM as simulator to simulate the immune system. Then, the dimension-reduced type generalized additive model (GAM) is employed to train a statistical regression model by using the input and output data of ABM and play a role as an emulator during history matching. Next, we reduce the input space of parameters by introducing an implausible measure to discard the implausible input values. At last, the estimation of model parameters is obtained using the particle swarm optimization algorithm (PSO) by fitting the experiment data among the non-implausible input values. The real Influeza A Virus (IAV) data set is employed to demonstrate the performance of our proposed method, and the results show that the proposed method not only has good fitting and predicting accuracy, but it also owns favorable computational efficiency.

  6. Nonlinear-regression groundwater flow modeling of a deep regional aquifer system

    USGS Publications Warehouse

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-01-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  7. Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System

    NASA Astrophysics Data System (ADS)

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-12-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  8. Model averaging and muddled multimodel inferences.

    PubMed

    Cade, Brian S

    2015-09-01

    Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the t statistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (Centrocercus urophasianus) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.

  9. Model averaging and muddled multimodel inferences

    USGS Publications Warehouse

    Cade, Brian S.

    2015-01-01

    Three flawed practices associated with model averaging coefficients for predictor variables in regression models commonly occur when making multimodel inferences in analyses of ecological data. Model-averaged regression coefficients based on Akaike information criterion (AIC) weights have been recommended for addressing model uncertainty but they are not valid, interpretable estimates of partial effects for individual predictors when there is multicollinearity among the predictor variables. Multicollinearity implies that the scaling of units in the denominators of the regression coefficients may change across models such that neither the parameters nor their estimates have common scales, therefore averaging them makes no sense. The associated sums of AIC model weights recommended to assess relative importance of individual predictors are really a measure of relative importance of models, with little information about contributions by individual predictors compared to other measures of relative importance based on effects size or variance reduction. Sometimes the model-averaged regression coefficients for predictor variables are incorrectly used to make model-averaged predictions of the response variable when the models are not linear in the parameters. I demonstrate the issues with the first two practices using the college grade point average example extensively analyzed by Burnham and Anderson. I show how partial standard deviations of the predictor variables can be used to detect changing scales of their estimates with multicollinearity. Standardizing estimates based on partial standard deviations for their variables can be used to make the scaling of the estimates commensurate across models, a necessary but not sufficient condition for model averaging of the estimates to be sensible. A unimodal distribution of estimates and valid interpretation of individual parameters are additional requisite conditions. The standardized estimates or equivalently the tstatistics on unstandardized estimates also can be used to provide more informative measures of relative importance than sums of AIC weights. Finally, I illustrate how seriously compromised statistical interpretations and predictions can be for all three of these flawed practices by critiquing their use in a recent species distribution modeling technique developed for predicting Greater Sage-Grouse (Centrocercus urophasianus) distribution in Colorado, USA. These model averaging issues are common in other ecological literature and ought to be discontinued if we are to make effective scientific contributions to ecological knowledge and conservation of natural resources.

  10. Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle.

    PubMed

    Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

    2016-12-01

    The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.

  11. Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle

    PubMed Central

    Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

    2016-01-01

    The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192

  12. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    NASA Astrophysics Data System (ADS)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  13. Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

    PubMed Central

    DeBlasio, Dan

    2013-01-01

    Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. PMID:23489379

  14. Estimation of actual evapotranspiration in the Nagqu river basin of the Tibetan Plateau

    NASA Astrophysics Data System (ADS)

    Zou, Mijun; Zhong, Lei; Ma, Yaoming; Hu, Yuanyuan; Feng, Lu

    2018-05-01

    As a critical component of the energy and water cycle, terrestrial actual evapotranspiration (ET) can be influenced by many factors. This study was mainly devoted to providing accurate and continuous estimations of actual ET for the Tibetan Plateau (TP) and analyzing the effects of its impact factors. In this study, summer observational data from the Coordinated Enhanced Observing Period (CEOP) Asia-Australia Monsoon Project (CAMP) on the Tibetan Plateau (CAMP/Tibet) for 2003 to 2004 was selected to determine actual ET and investigate its relationship with energy, hydrological, and dynamical parameters. Multiple-layer air temperature, relative humidity, net radiation flux, wind speed, precipitation, and soil moisture were used to estimate actual ET. The regression model simulation results were validated with independent data retrieved using the combinatory method. The results suggested that significant correlations exist between actual ET and hydro-meteorological parameters in the surface layer of the Nagqu river basin, among which the most important factors are energy-related elements (net radiation flux and air temperature). The results also suggested that how ET is eventually affected by precipitation and two-layer wind speed difference depends on whether their positive or negative feedback processes have a more important role. The multivariate linear regression method provided reliable estimations of actual ET; thus, 6-parameter simplified schemes and 14-parameter regular schemes were established.

  15. Background stratified Poisson regression analysis of cohort data.

    PubMed

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  16. Estimating procedure times for surgeries by determining location parameters for the lognormal model.

    PubMed

    Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

    2004-05-01

    We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

  17. EFFECTS OF IMPROVED PRECIPITATION ESTIMATES ON AUTOMATED RUNOFF MAPPING: EASTERN UNITED STATES

    EPA Science Inventory

    We evaluated maps of runoff created by means of two automated procedures. We implemented each procedure using precipitation estimates of both 5-km and 10-km resolution from PRISM (Parameter-elevation Regressions on Independent Slopes Model). Our goal was to determine if using the...

  18. Effects of Employing Ridge Regression in Structural Equation Models.

    ERIC Educational Resources Information Center

    McQuitty, Shaun

    1997-01-01

    LISREL 8 invokes a ridge option when maximum likelihood or generalized least squares are used to estimate a structural equation model with a nonpositive definite covariance or correlation matrix. Implications of the ridge option for model fit, parameter estimates, and standard errors are explored through two examples. (SLD)

  19. X-31 aerodynamic characteristics determined from flight data

    NASA Technical Reports Server (NTRS)

    Kokolios, Alex

    1993-01-01

    The lateral aerodynamic characteristics of the X-31 were determined at angles of attack ranging from 20 to 45 deg. Estimates of the lateral stability and control parameters were obtained by applying two parameter estimation techniques, linear regression, and the extended Kalman filter to flight test data. An attempt to apply maximum likelihood to extract parameters from the flight data was also made but failed for the reasons presented. An overview of the System Identification process is given. The overview includes a listing of the more important properties of all three estimation techniques that were applied to the data. A comparison is given of results obtained from flight test data and wind tunnel data for four important lateral parameters. Finally, future research to be conducted in this area is discussed.

  20. Epidemiologic Evaluation of Measurement Data in the Presence of Detection Limits

    PubMed Central

    Lubin, Jay H.; Colt, Joanne S.; Camann, David; Davis, Scott; Cerhan, James R.; Severson, Richard K.; Bernstein, Leslie; Hartge, Patricia

    2004-01-01

    Quantitative measurements of environmental factors greatly improve the quality of epidemiologic studies but can pose challenges because of the presence of upper or lower detection limits or interfering compounds, which do not allow for precise measured values. We consider the regression of an environmental measurement (dependent variable) on several covariates (independent variables). Various strategies are commonly employed to impute values for interval-measured data, including assignment of one-half the detection limit to nondetected values or of “fill-in” values randomly selected from an appropriate distribution. On the basis of a limited simulation study, we found that the former approach can be biased unless the percentage of measurements below detection limits is small (5–10%). The fill-in approach generally produces unbiased parameter estimates but may produce biased variance estimates and thereby distort inference when 30% or more of the data are below detection limits. Truncated data methods (e.g., Tobit regression) and multiple imputation offer two unbiased approaches for analyzing measurement data with detection limits. If interest resides solely on regression parameters, then Tobit regression can be used. If individualized values for measurements below detection limits are needed for additional analysis, such as relative risk regression or graphical display, then multiple imputation produces unbiased estimates and nominal confidence intervals unless the proportion of missing data is extreme. We illustrate various approaches using measurements of pesticide residues in carpet dust in control subjects from a case–control study of non-Hodgkin lymphoma. PMID:15579415

  1. Impact of multicollinearity on small sample hydrologic regression models

    NASA Astrophysics Data System (ADS)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  2. Parameter estimation techniques based on optimizing goodness-of-fit statistics for structural reliability

    NASA Technical Reports Server (NTRS)

    Starlinger, Alois; Duffy, Stephen F.; Palko, Joseph L.

    1993-01-01

    New methods are presented that utilize the optimization of goodness-of-fit statistics in order to estimate Weibull parameters from failure data. It is assumed that the underlying population is characterized by a three-parameter Weibull distribution. Goodness-of-fit tests are based on the empirical distribution function (EDF). The EDF is a step function, calculated using failure data, and represents an approximation of the cumulative distribution function for the underlying population. Statistics (such as the Kolmogorov-Smirnov statistic and the Anderson-Darling statistic) measure the discrepancy between the EDF and the cumulative distribution function (CDF). These statistics are minimized with respect to the three Weibull parameters. Due to nonlinearities encountered in the minimization process, Powell's numerical optimization procedure is applied to obtain the optimum value of the EDF. Numerical examples show the applicability of these new estimation methods. The results are compared to the estimates obtained with Cooper's nonlinear regression algorithm.

  3. GSTAR-SUR Modeling With Calendar Variations And Intervention To Forecast Outflow Of Currencies In Java Indonesia

    NASA Astrophysics Data System (ADS)

    Akbar, M. S.; Setiawan; Suhartono; Ruchjana, B. N.; Riyadi, M. A. A.

    2018-03-01

    Ordinary Least Squares (OLS) is general method to estimates Generalized Space Time Autoregressive (GSTAR) parameters. But in some cases, the residuals of GSTAR are correlated between location. If OLS is applied to this case, then the estimators are inefficient. Generalized Least Squares (GLS) is a method used in Seemingly Unrelated Regression (SUR) model. This method estimated parameters of some models with residuals between equations are correlated. Simulation study shows that GSTAR with GLS method for estimating parameters (GSTAR-SUR) is more efficient than GSTAR-OLS method. The purpose of this research is to apply GSTAR-SUR with calendar variation and intervention as exogenous variable (GSTARX-SUR) for forecast outflow of currency in Java, Indonesia. As a result, GSTARX-SUR provides better performance than GSTARX-OLS.

  4. Gradient descent for robust kernel-based regression

    NASA Astrophysics Data System (ADS)

    Guo, Zheng-Chu; Hu, Ting; Shi, Lei

    2018-06-01

    In this paper, we study the gradient descent algorithm generated by a robust loss function over a reproducing kernel Hilbert space (RKHS). The loss function is defined by a windowing function G and a scale parameter σ, which can include a wide range of commonly used robust losses for regression. There is still a gap between theoretical analysis and optimization process of empirical risk minimization based on loss: the estimator needs to be global optimal in the theoretical analysis while the optimization method can not ensure the global optimality of its solutions. In this paper, we aim to fill this gap by developing a novel theoretical analysis on the performance of estimators generated by the gradient descent algorithm. We demonstrate that with an appropriately chosen scale parameter σ, the gradient update with early stopping rules can approximate the regression function. Our elegant error analysis can lead to convergence in the standard L 2 norm and the strong RKHS norm, both of which are optimal in the mini-max sense. We show that the scale parameter σ plays an important role in providing robustness as well as fast convergence. The numerical experiments implemented on synthetic examples and real data set also support our theoretical results.

  5. Analyzing Multilevel Data: An Empirical Comparison of Parameter Estimates of Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2011-01-01

    Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…

  6. Assessing Interval Estimation Methods for Hill Model ...

    EPA Pesticide Factsheets

    The Hill model of concentration-response is ubiquitous in toxicology, perhaps because its parameters directly relate to biologically significant metrics of toxicity such as efficacy and potency. Point estimates of these parameters obtained through least squares regression or maximum likelihood are commonly used in high-throughput risk assessment, but such estimates typically fail to include reliable information concerning confidence in (or precision of) the estimates. To address this issue, we examined methods for assessing uncertainty in Hill model parameter estimates derived from concentration-response data. In particular, using a sample of ToxCast concentration-response data sets, we applied four methods for obtaining interval estimates that are based on asymptotic theory, bootstrapping (two varieties), and Bayesian parameter estimation, and then compared the results. These interval estimation methods generally did not agree, so we devised a simulation study to assess their relative performance. We generated simulated data by constructing four statistical error models capable of producing concentration-response data sets comparable to those observed in ToxCast. We then applied the four interval estimation methods to the simulated data and compared the actual coverage of the interval estimates to the nominal coverage (e.g., 95%) in order to quantify performance of each of the methods in a variety of cases (i.e., different values of the true Hill model paramet

  7. Simultaneous estimation of local-scale and flow path-scale dual-domain mass transfer parameters using geoelectrical monitoring

    USGS Publications Warehouse

    Briggs, Martin A.; Day-Lewis, Frederick D.; Ong, John B.; Curtis, Gary P.; Lane, John W.

    2013-01-01

    Anomalous solute transport, modeled as rate-limited mass transfer, has an observable geoelectrical signature that can be exploited to infer the controlling parameters. Previous experiments indicate the combination of time-lapse geoelectrical and fluid conductivity measurements collected during ionic tracer experiments provides valuable insight into the exchange of solute between mobile and immobile porosity. Here, we use geoelectrical measurements to monitor tracer experiments at a former uranium mill tailings site in Naturita, Colorado. We use nonlinear regression to calibrate dual-domain mass transfer solute-transport models to field data. This method differs from previous approaches by calibrating the model simultaneously to observed fluid conductivity and geoelectrical tracer signals using two parameter scales: effective parameters for the flow path upgradient of the monitoring point and the parameters local to the monitoring point. We use regression statistics to rigorously evaluate the information content and sensitivity of fluid conductivity and geophysical data, demonstrating multiple scales of mass transfer parameters can simultaneously be estimated. Our results show, for the first time, field-scale spatial variability of mass transfer parameters (i.e., exchange-rate coefficient, porosity) between local and upgradient effective parameters; hence our approach provides insight into spatial variability and scaling behavior. Additional synthetic modeling is used to evaluate the scope of applicability of our approach, indicating greater range than earlier work using temporal moments and a Lagrangian-based Damköhler number. The introduced Eulerian-based Damköhler is useful for estimating tracer injection duration needed to evaluate mass transfer exchange rates that range over several orders of magnitude.

  8. Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model.

    PubMed

    Sun, Jianguo; Feng, Yanqin; Zhao, Hui

    2015-01-01

    Interval-censored failure time data occur in many fields including epidemiological and medical studies as well as financial and sociological studies, and many authors have investigated their analysis (Sun, The statistical analysis of interval-censored failure time data, 2006; Zhang, Stat Modeling 9:321-343, 2009). In particular, a number of procedures have been developed for regression analysis of interval-censored data arising from the proportional hazards model (Finkelstein, Biometrics 42:845-854, 1986; Huang, Ann Stat 24:540-568, 1996; Pan, Biometrics 56:199-203, 2000). For most of these procedures, however, one drawback is that they involve estimation of both regression parameters and baseline cumulative hazard function. In this paper, we propose two simple estimation approaches that do not need estimation of the baseline cumulative hazard function. The asymptotic properties of the resulting estimates are given, and an extensive simulation study is conducted and indicates that they work well for practical situations.

  9. Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.

    PubMed

    Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio

    2014-11-24

    The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.

  10. TSS concentration in sewers estimated from turbidity measurements by means of linear regression accounting for uncertainties in both variables.

    PubMed

    Bertrand-Krajewski, J L

    2004-01-01

    In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.

  11. Global distribution of urban parameters derived from high-resolution global datasets for weather modelling

    NASA Astrophysics Data System (ADS)

    Kawano, N.; Varquez, A. C. G.; Dong, Y.; Kanda, M.

    2016-12-01

    Numerical model such as Weather Research and Forecasting model coupled with single-layer Urban Canopy Model (WRF-UCM) is one of the powerful tools to investigate urban heat island. Urban parameters such as average building height (Have), plain area index (λp) and frontal area index (λf), are necessary inputs for the model. In general, these parameters are uniformly assumed in WRF-UCM but this leads to unrealistic urban representation. Distributed urban parameters can also be incorporated into WRF-UCM to consider a detail urban effect. The problem is that distributed building information is not readily available for most megacities especially in developing countries. Furthermore, acquiring real building parameters often require huge amount of time and money. In this study, we investigated the potential of using globally available satellite-captured datasets for the estimation of the parameters, Have, λp, and λf. Global datasets comprised of high spatial resolution population dataset (LandScan by Oak Ridge National Laboratory), nighttime lights (NOAA), and vegetation fraction (NASA). True samples of Have, λp, and λf were acquired from actual building footprints from satellite images and 3D building database of Tokyo, New York, Paris, Melbourne, Istanbul, Jakarta and so on. Regression equations were then derived from the block-averaging of spatial pairs of real parameters and global datasets. Results show that two regression curves to estimate Have and λf from the combination of population and nightlight are necessary depending on the city's level of development. An index which can be used to decide which equation to use for a city is the Gross Domestic Product (GDP). On the other hand, λphas less dependence on GDP but indicated a negative relationship to vegetation fraction. Finally, a simplified but precise approximation of urban parameters through readily-available, high-resolution global datasets and our derived regressions can be utilized to estimate a global distribution of urban parameters for later incorporation into a weather model, thus allowing us to acquire a global understanding of urban climate (Global Urban Climatology). Acknowledgment: This research was supported by the Environment Research and Technology Development Fund (S-14) of the Ministry of the Environment, Japan.

  12. A Method for Assessing the Quality of Model-Based Estimates of Ground Temperature and Atmospheric Moisture Using Satellite Data

    NASA Technical Reports Server (NTRS)

    Wu, Man Li C.; Schubert, Siegfried; Lin, Ching I.; Stajner, Ivanka; Einaudi, Franco (Technical Monitor)

    2000-01-01

    A method is developed for validating model-based estimates of atmospheric moisture and ground temperature using satellite data. The approach relates errors in estimates of clear-sky longwave fluxes at the top of the Earth-atmosphere system to errors in geophysical parameters. The fluxes include clear-sky outgoing longwave radiation (CLR) and radiative flux in the window region between 8 and 12 microns (RadWn). The approach capitalizes on the availability of satellite estimates of CLR and RadWn and other auxiliary satellite data, and multiple global four-dimensional data assimilation (4-DDA) products. The basic methodology employs off-line forward radiative transfer calculations to generate synthetic clear-sky longwave fluxes from two different 4-DDA data sets. Simple linear regression is used to relate the clear-sky longwave flux discrepancies to discrepancies in ground temperature ((delta)T(sub g)) and broad-layer integrated atmospheric precipitable water ((delta)pw). The slopes of the regression lines define sensitivity parameters which can be exploited to help interpret mismatches between satellite observations and model-based estimates of clear-sky longwave fluxes. For illustration we analyze the discrepancies in the clear-sky longwave fluxes between an early implementation of the Goddard Earth Observing System Data Assimilation System (GEOS2) and a recent operational version of the European Centre for Medium-Range Weather Forecasts data assimilation system. The analysis of the synthetic clear-sky flux data shows that simple linear regression employing (delta)T(sub g)) and broad layer (delta)pw provides a good approximation to the full radiative transfer calculations, typically explaining more thin 90% of the 6 hourly variance in the flux differences. These simple regression relations can be inverted to "retrieve" the errors in the geophysical parameters, Uncertainties (normalized by standard deviation) in the monthly mean retrieved parameters range from 7% for (delta)T(sub g) to approx. 20% for the lower tropospheric moisture between 500 hPa and surface. The regression relationships developed from the synthetic flux data, together with CLR and RadWn observed with the Clouds and Earth Radiant Energy System instrument, ire used to assess the quality of the GEOS2 T(sub g) and pw. Results showed that the GEOS2 T(sub g) is too cold over land, and pw in upper layers is too high over the tropical oceans and too low in the lower atmosphere.

  13. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  14. Techniques for estimating flood peak discharges for unregulated streams and streams regulated by small floodwater retarding structures in Oklahoma

    USGS Publications Warehouse

    Tortorelli, R.L.; Bergman, D.L.

    1985-01-01

    Statewide regression relations for Oklahoma were determined for estimating peak discharge of floods for selected recurrence intervals from 2 to 500 years. The independent variables required for estimating flood discharge for rural streams are contributing drainage area and mean annual precipitation. Main-channel slope, a variable used in previous reports, was found to contribute very little to the accuracy of the relations and was not used. The regression equations are applicable for watersheds with drainage areas less than 2,500 square miles that are not significantly affected by regulation from manmade works. These relations are presented in graphical form for easy application. Limitations on the use of the regression relations and the reliability of regression estimates for rural unregulated streams are discussed. Basin and climatic characteristics, log-Pearson Type III statistics and the flood-frequency relations for 226 gaging stations in Oklahoma and adjacent states are presented. Regression relations are investigated for estimating flood magnitude and frequency for watersheds affected by regulation from small FRS (floodwater retarding structures) built by the U.S. Soil Conservation Service in their watershed protection and flood prevention program. Gaging-station data from nine FRS regulated sites in Oklahoma and one FRS regulated site in Kansas are used. For sites regulated by FRS, an adjustment of the statewide rural regression relations can be used to estimate flood magnitude and frequency. The statewide regression equations are used by substituting the drainage area below the FRS, or drainage area that represents the percent of the basin unregulated, in the contributing drainage area parameter to obtain flood-frequency estimates. Flood-frequency curves and flow-duration curves are presented for five gaged sites to illustrate the effects of FRS regulation on peak discharge.

  15. Oracle estimation of parametric models under boundary constraints.

    PubMed

    Wong, Kin Yau; Goldberg, Yair; Fine, Jason P

    2016-12-01

    In many classical estimation problems, the parameter space has a boundary. In most cases, the standard asymptotic properties of the estimator do not hold when some of the underlying true parameters lie on the boundary. However, without knowledge of the true parameter values, confidence intervals constructed assuming that the parameters lie in the interior are generally over-conservative. A penalized estimation method is proposed in this article to address this issue. An adaptive lasso procedure is employed to shrink the parameters to the boundary, yielding oracle inference which adapt to whether or not the true parameters are on the boundary. When the true parameters are on the boundary, the inference is equivalent to that which would be achieved with a priori knowledge of the boundary, while if the converse is true, the inference is equivalent to that which is obtained in the interior of the parameter space. The method is demonstrated under two practical scenarios, namely the frailty survival model and linear regression with order-restricted parameters. Simulation studies and real data analyses show that the method performs well with realistic sample sizes and exhibits certain advantages over standard methods. © 2016, The International Biometric Society.

  16. Identification of dominant interactions between climatic seasonality, catchment characteristics and agricultural activities on Budyko-type equation parameter estimation

    NASA Astrophysics Data System (ADS)

    Xing, Wanqiu; Wang, Weiguang; Shao, Quanxi; Yong, Bin

    2018-01-01

    Quantifying precipitation (P) partition into evapotranspiration (E) and runoff (Q) is of great importance for global and regional water availability assessment. Budyko framework serves as a powerful tool to make simple and transparent estimation for the partition, using a single parameter, to characterize the shape of the Budyko curve for a "specific basin", where the single parameter reflects the overall effect by not only climatic seasonality, catchment characteristics (e.g., soil, topography and vegetation) but also agricultural activities (e.g., cultivation and irrigation). At the regional scale, these influencing factors are interconnected, and the interactions between them can also affect the single parameter of Budyko-type equations' estimating. Here we employ the multivariate adaptive regression splines (MARS) model to estimate the Budyko curve shape parameter (n in the Choudhury's equation, one form of the Budyko framework) of the selected 96 catchments across China using a data set of long-term averages for climatic seasonality, catchment characteristics and agricultural activities. Results show average storm depth (ASD), vegetation coverage (M), and seasonality index of precipitation (SI) are three statistically significant factors affecting the Budyko parameter. More importantly, four pairs of interactions are recognized by the MARS model as: The interaction between CA (percentage of cultivated land area to total catchment area) and ASD shows that the cultivation can weaken the reducing effect of high ASD (>46.78 mm) on the Budyko parameter estimating. Drought (represented by the value of Palmer drought severity index < -0.74) and uneven distribution of annual rainfall (represented by the value of coefficient of variation of precipitation > 0.23) tend to enhance the Budyko parameter reduction by large SI (>0.797). Low vegetation coverage (34.56%) is likely to intensify the rising effect on evapotranspiration ratio by IA (percentage of irrigation area to total catchment area). The Budyko n values estimated by the MARS model reproduce the calculated ones by the observation well for the selected 96 catchments (with R = 0.817, MAE = 4.09). Compared to the multiple stepwise regression model estimating the parameter n taken the influencing factors as independent inputs, the MARS model enhances the capability of the Budyko framework for assessing water availability at regional scale using readily available data.

  17. Multivariate generalized hidden Markov regression models with random covariates: Physical exercise in an elderly population.

    PubMed

    Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello

    2018-04-22

    A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.

  18. Competing risks regression for clustered data

    PubMed Central

    Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam

    2012-01-01

    A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910

  19. Nonlinear regression method for estimating neutral wind and temperature from Fabry-Perot interferometer data.

    PubMed

    Harding, Brian J; Gehrels, Thomas W; Makela, Jonathan J

    2014-02-01

    The Earth's thermosphere plays a critical role in driving electrodynamic processes in the ionosphere and in transferring solar energy to the atmosphere, yet measurements of thermospheric state parameters, such as wind and temperature, are sparse. One of the most popular techniques for measuring these parameters is to use a Fabry-Perot interferometer to monitor the Doppler width and breadth of naturally occurring airglow emissions in the thermosphere. In this work, we present a technique for estimating upper-atmospheric winds and temperatures from images of Fabry-Perot fringes captured by a CCD detector. We estimate instrument parameters from fringe patterns of a frequency-stabilized laser, and we use these parameters to estimate winds and temperatures from airglow fringe patterns. A unique feature of this technique is the model used for the laser and airglow fringe patterns, which fits all fringes simultaneously and attempts to model the effects of optical defects. This technique yields accurate estimates for winds, temperatures, and the associated uncertainties in these parameters, as we show with a Monte Carlo simulation.

  20. Complementary nonparametric analysis of covariance for logistic regression in a randomized clinical trial setting.

    PubMed

    Tangen, C M; Koch, G G

    1999-03-01

    In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.

  1. Regularized quantile regression for SNP marker estimation of pig growth curves.

    PubMed

    Barroso, L M A; Nascimento, M; Nascimento, A C C; Silva, F F; Serão, N V L; Cruz, C D; Resende, M D V; Silva, F L; Azevedo, C F; Lopes, P S; Guimarães, S E F

    2017-01-01

    Genomic growth curves are generally defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression (QR). This methodology allows for the estimation of marker effects at different levels of the variable of interest. We aimed to propose and evaluate a regularized quantile regression for SNP marker effect estimation of pig growth curves, as well as to identify the chromosome regions of the most relevant markers and to estimate the genetic individual weight trajectory over time (genomic growth curve) under different quantiles (levels). The regularized quantile regression (RQR) enabled the discovery, at different levels of interest (quantiles), of the most relevant markers allowing for the identification of QTL regions. We found the same relevant markers simultaneously affecting different growth curve parameters (mature weight and maturity rate): two (ALGA0096701 and ALGA0029483) for RQR(0.2), one (ALGA0096701) for RQR(0.5), and one (ALGA0003761) for RQR(0.8). Three average genomic growth curves were obtained and the behavior was explained by the curve in quantile 0.2, which differed from the others. RQR allowed for the construction of genomic growth curves, which is the key to identifying and selecting the most desirable animals for breeding purposes. Furthermore, the proposed model enabled us to find, at different levels of interest (quantiles), the most relevant markers for each trait (growth curve parameter estimates) and their respective chromosomal positions (identification of new QTL regions for growth curves in pigs). These markers can be exploited under the context of marker assisted selection while aiming to change the shape of pig growth curves.

  2. Estimating the Expected Value of Sample Information Using the Probabilistic Sensitivity Analysis Sample

    PubMed Central

    Oakley, Jeremy E.; Brennan, Alan; Breeze, Penny

    2015-01-01

    Health economic decision-analytic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value to the decision maker of reducing uncertainty through collecting new data. In the context of a particular decision problem, the value of a proposed research design can be quantified by its expected value of sample information (EVSI). EVSI is commonly estimated via a 2-level Monte Carlo procedure in which plausible data sets are generated in an outer loop, and then, conditional on these, the parameters of the decision model are updated via Bayes rule and sampled in an inner loop. At each iteration of the inner loop, the decision model is evaluated. This is computationally demanding and may be difficult if the posterior distribution of the model parameters conditional on sampled data is hard to sample from. We describe a fast nonparametric regression-based method for estimating per-patient EVSI that requires only the probabilistic sensitivity analysis sample (i.e., the set of samples drawn from the joint distribution of the parameters and the corresponding net benefits). The method avoids the need to sample from the posterior distributions of the parameters and avoids the need to rerun the model. The only requirement is that sample data sets can be generated. The method is applicable with a model of any complexity and with any specification of model parameter distribution. We demonstrate in a case study the superior efficiency of the regression method over the 2-level Monte Carlo method. PMID:25810269

  3. Establishing endangered species recovery criteria using predictive simulation modeling

    USGS Publications Warehouse

    McGowan, Conor P.; Catlin, Daniel H.; Shaffer, Terry L.; Gratto-Trevor, Cheri L.; Aron, Carol

    2014-01-01

    Listing a species under the Endangered Species Act (ESA) and developing a recovery plan requires U.S. Fish and Wildlife Service to establish specific and measurable criteria for delisting. Generally, species are listed because they face (or are perceived to face) elevated risk of extinction due to issues such as habitat loss, invasive species, or other factors. Recovery plans identify recovery criteria that reduce extinction risk to an acceptable level. It logically follows that the recovery criteria, the defined conditions for removing a species from ESA protections, need to be closely related to extinction risk. Extinction probability is a population parameter estimated with a model that uses current demographic information to project the population into the future over a number of replicates, calculating the proportion of replicated populations that go extinct. We simulated extinction probabilities of piping plovers in the Great Plains and estimated the relationship between extinction probability and various demographic parameters. We tested the fit of regression models linking initial abundance, productivity, or population growth rate to extinction risk, and then, using the regression parameter estimates, determined the conditions required to reduce extinction probability to some pre-defined acceptable threshold. Binomial regression models with mean population growth rate and the natural log of initial abundance were the best predictors of extinction probability 50 years into the future. For example, based on our regression models, an initial abundance of approximately 2400 females with an expected mean population growth rate of 1.0 will limit extinction risk for piping plovers in the Great Plains to less than 0.048. Our method provides a straightforward way of developing specific and measurable recovery criteria linked directly to the core issue of extinction risk. Published by Elsevier Ltd.

  4. A Generalized Least Squares Regression Approach for Computing Effect Sizes in Single-Case Research: Application Examples

    ERIC Educational Resources Information Center

    Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.

    2011-01-01

    A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…

  5. Multiple-trait random regression models for the estimation of genetic parameters for milk, fat, and protein yield in buffaloes.

    PubMed

    Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto

    2013-09-01

    In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  6. Genetic analysis of milk production traits of Tunisian Holsteins using random regression test-day model with Legendre polynomials

    PubMed Central

    2018-01-01

    Objective The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. Methods A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. Results All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. Conclusion These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins. PMID:28823122

  7. Genetic analysis of milk production traits of Tunisian Holsteins using random regression test-day model with Legendre polynomials.

    PubMed

    Ben Zaabza, Hafedh; Ben Gara, Abderrahmen; Rekik, Boulbaba

    2018-05-01

    The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins.

  8. Regression without truth with Markov chain Monte-Carlo

    NASA Astrophysics Data System (ADS)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Å piclin, Žiga

    2017-03-01

    Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.

  9. Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study

    PubMed Central

    Kim, Minjung; Lamont, Andrea E.; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M. Lee

    2015-01-01

    Regression mixture models are a novel approach for modeling heterogeneous effects of predictors on an outcome. In the model building process residual variances are often disregarded and simplifying assumptions made without thorough examination of the consequences. This simulation study investigated the impact of an equality constraint on the residual variances across latent classes. We examine the consequence of constraining the residual variances on class enumeration (finding the true number of latent classes) and parameter estimates under a number of different simulation conditions meant to reflect the type of heterogeneity likely to exist in applied analyses. Results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted estimated class sizes and showed the potential to greatly impact parameter estimates in each class. Results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions were made. PMID:26139512

  10. Bayesian Probability Theory

    NASA Astrophysics Data System (ADS)

    von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo

    2014-06-01

    Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.

  11. R programming for parameters estimation of geographically weighted ordinal logistic regression (GWOLR) model based on Newton Raphson

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Saputro, Dewi Retno Sari

    2017-03-01

    GWOLR model used for represent relationship between dependent variable has categories and scale of category is ordinal with independent variable influenced the geographical location of the observation site. Parameters estimation of GWOLR model use maximum likelihood provide system of nonlinear equations and hard to be found the result in analytic resolution. By finishing it, it means determine the maximum completion, this thing associated with optimizing problem. The completion nonlinear system of equations optimize use numerical approximation, which one is Newton Raphson method. The purpose of this research is to make iteration algorithm Newton Raphson and program using R software to estimate GWOLR model. Based on the research obtained that program in R can be used to estimate the parameters of GWOLR model by forming a syntax program with command "while".

  12. Spatial design and strength of spatial signal: Effects on covariance estimation

    USGS Publications Warehouse

    Irvine, Kathryn M.; Gitelman, Alix I.; Hoeting, Jennifer A.

    2007-01-01

    In a spatial regression context, scientists are often interested in a physical interpretation of components of the parametric covariance function. For example, spatial covariance parameter estimates in ecological settings have been interpreted to describe spatial heterogeneity or “patchiness” in a landscape that cannot be explained by measured covariates. In this article, we investigate the influence of the strength of spatial dependence on maximum likelihood (ML) and restricted maximum likelihood (REML) estimates of covariance parameters in an exponential-with-nugget model, and we also examine these influences under different sampling designs—specifically, lattice designs and more realistic random and cluster designs—at differing intensities of sampling (n=144 and 361). We find that neither ML nor REML estimates perform well when the range parameter and/or the nugget-to-sill ratio is large—ML tends to underestimate the autocorrelation function and REML produces highly variable estimates of the autocorrelation function. The best estimates of both the covariance parameters and the autocorrelation function come under the cluster sampling design and large sample sizes. As a motivating example, we consider a spatial model for stream sulfate concentration.

  13. Large biases in regression-based constituent flux estimates: causes and diagnostic tools

    USGS Publications Warehouse

    Hirsch, Robert M.

    2014-01-01

    It has been documented in the literature that, in some cases, widely used regression-based models can produce severely biased estimates of long-term mean river fluxes of various constituents. These models, estimated using sample values of concentration, discharge, and date, are used to compute estimated fluxes for a multiyear period at a daily time step. This study compares results of the LOADEST seven-parameter model, LOADEST five-parameter model, and the Weighted Regressions on Time, Discharge, and Season (WRTDS) model using subsampling of six very large datasets to better understand this bias problem. This analysis considers sample datasets for dissolved nitrate and total phosphorus. The results show that LOADEST-7 and LOADEST-5, although they often produce very nearly unbiased results, can produce highly biased results. This study identifies three conditions that can give rise to these severe biases: (1) lack of fit of the log of concentration vs. log discharge relationship, (2) substantial differences in the shape of this relationship across seasons, and (3) severely heteroscedastic residuals. The WRTDS model is more resistant to the bias problem than the LOADEST models but is not immune to them. Understanding the causes of the bias problem is crucial to selecting an appropriate method for flux computations. Diagnostic tools for identifying the potential for bias problems are introduced, and strategies for resolving bias problems are described.

  14. Influence plots for LASSO

    DOE PAGES

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    2016-11-22

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  15. Influence plots for LASSO

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  16. Robust inference in the negative binomial regression model with an application to falls data.

    PubMed

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.

  17. The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rupšys, P.

    A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.

  18. [Aboveground biomass of three conifers in Qianyanzhou plantation].

    PubMed

    Li, Xuanran; Liu, Qijing; Chen, Yongrui; Hu, Lile; Yang, Fengting

    2006-08-01

    In this paper, the regressive models of the aboveground biomass of Pinus elliottii, P. massoniana and Cunninghamia lanceolata in Qianyanzhou of subtropical China were established, and the regression analysis on the dry weight of leaf biomass and total biomass against branch diameter (d), branch length (L), d3 and d2L was conducted with linear, power and exponent functions. Power equation with single parameter (d) was proved to be better than the rests for P. massoniana and C. lanceolata, and linear equation with parameter (d3) was better for P. elliottii. The canopy biomass was derived by the regression equations for all branches. These equations were also used to fit the relationships of total tree biomass, branch biomass and foliage biomass with tree diameter at breast height (D), tree height (H), D3 and D2H, respectively. D2H was found to be the best parameter for estimating total biomass. For foliage-and branch biomass, both parameters and equation forms showed some differences among species. Correlations were highly significant (P <0.001) for foliage-, branch-and total biomass, with the highest for total biomass. By these equations, the aboveground biomass and its allocation were estimated, with the aboveground biomass of P. massoniana, P. elliottii, and C. lanceolata forests being 83.6, 72. 1 and 59 t x hm(-2), respectively, and more stem biomass than foliage-and branch biomass. According to the previous studies, the underground biomass of these three forests was estimated to be 10.44, 9.42 and 11.48 t x hm(-2), and the amount of fixed carbon was 47.94, 45.14 and 37.52 t x hm(-2), respectively.

  19. A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix.

    PubMed

    Westgate, Philip M

    2013-07-20

    Generalized estimating equations (GEEs) are routinely used for the marginal analysis of correlated data. The efficiency of GEE depends on how closely the working covariance structure resembles the true structure, and therefore accurate modeling of the working correlation of the data is important. A popular approach is the use of an unstructured working correlation matrix, as it is not as restrictive as simpler structures such as exchangeable and AR-1 and thus can theoretically improve efficiency. However, because of the potential for having to estimate a large number of correlation parameters, variances of regression parameter estimates can be larger than theoretically expected when utilizing the unstructured working correlation matrix. Therefore, standard error estimates can be negatively biased. To account for this additional finite-sample variability, we derive a bias correction that can be applied to typical estimators of the covariance matrix of parameter estimates. Via simulation and in application to a longitudinal study, we show that our proposed correction improves standard error estimation and statistical inference. Copyright © 2012 John Wiley & Sons, Ltd.

  20. Spatial interpolation schemes of daily precipitation for hydrologic modeling

    USGS Publications Warehouse

    Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.

    2012-01-01

    Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.

  1. Fast function-on-scalar regression with penalized basis expansions.

    PubMed

    Reiss, Philip T; Huang, Lei; Mennes, Maarten

    2010-01-01

    Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.

  2. Aircraft Engine Thrust Estimator Design Based on GSA-LSSVM

    NASA Astrophysics Data System (ADS)

    Sheng, Hanlin; Zhang, Tianhong

    2017-08-01

    In view of the necessity of highly precise and reliable thrust estimator to achieve direct thrust control of aircraft engine, based on support vector regression (SVR), as well as least square support vector machine (LSSVM) and a new optimization algorithm - gravitational search algorithm (GSA), by performing integrated modelling and parameter optimization, a GSA-LSSVM-based thrust estimator design solution is proposed. The results show that compared to particle swarm optimization (PSO) algorithm, GSA can find unknown optimization parameter better and enables the model developed with better prediction and generalization ability. The model can better predict aircraft engine thrust and thus fulfills the need of direct thrust control of aircraft engine.

  3. Decoding and modelling of time series count data using Poisson hidden Markov model and Markov ordinal logistic regression models.

    PubMed

    Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I

    2018-01-01

    Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.

  4. Parameter Heterogeneity In Breast Cancer Cost Regressions – Evidence From Five European Countries

    PubMed Central

    Banks, Helen; Campbell, Harry; Douglas, Anne; Fletcher, Eilidh; McCallum, Alison; Moger, Tron Anders; Peltola, Mikko; Sveréus, Sofia; Wild, Sarah; Williams, Linda J.; Forbes, John

    2015-01-01

    Abstract We investigate parameter heterogeneity in breast cancer 1‐year cumulative hospital costs across five European countries as part of the EuroHOPE project. The paper aims to explore whether conditional mean effects provide a suitable representation of the national variation in hospital costs. A cohort of patients with a primary diagnosis of invasive breast cancer (ICD‐9 codes 174 and ICD‐10 C50 codes) is derived using routinely collected individual breast cancer data from Finland, the metropolitan area of Turin (Italy), Norway, Scotland and Sweden. Conditional mean effects are estimated by ordinary least squares for each country, and quantile regressions are used to explore heterogeneity across the conditional quantile distribution. Point estimates based on conditional mean effects provide a good approximation of treatment response for some key demographic and diagnostic specific variables (e.g. age and ICD‐10 diagnosis) across the conditional quantile distribution. For many policy variables of interest, however, there is considerable evidence of parameter heterogeneity that is concealed if decisions are based solely on conditional mean results. The use of quantile regression methods reinforce the need to consider beyond an average effect given the greater recognition that breast cancer is a complex disease reflecting patient heterogeneity. © 2015 The Authors. Health Economics Published by John Wiley & Sons Ltd. PMID:26633866

  5. A Fast Solution of the Lindley Equations for the M-Group Regression Problem. Technical Report 78-3, October 1977 through May 1978.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.

    The technical problems involved in obtaining Bayesian model estimates for the regression parameters in m similar groups are studied. The available computer programs, BPREP (BASIC), and BAYREG, both written in FORTRAN, require an amount of computer processing that does not encourage regular use. These programs are analyzed so that the performance…

  6. Parameter estimation method and updating of regional prediction equations for ungaged sites in the desert region of California

    USGS Publications Warehouse

    Barth, Nancy A.; Veilleux, Andrea G.

    2012-01-01

    The U.S. Geological Survey (USGS) is currently updating at-site flood frequency estimates for USGS streamflow-gaging stations in the desert region of California. The at-site flood-frequency analysis is complicated by short record lengths (less than 20 years is common) and numerous zero flows/low outliers at many sites. Estimates of the three parameters (mean, standard deviation, and skew) required for fitting the log Pearson Type 3 (LP3) distribution are likely to be highly unreliable based on the limited and heavily censored at-site data. In a generalization of the recommendations in Bulletin 17B, a regional analysis was used to develop regional estimates of all three parameters (mean, standard deviation, and skew) of the LP3 distribution. A regional skew value of zero from a previously published report was used with a new estimated mean squared error (MSE) of 0.20. A weighted least squares (WLS) regression method was used to develop both a regional standard deviation and a mean model based on annual peak-discharge data for 33 USGS stations throughout California’s desert region. At-site standard deviation and mean values were determined by using an expected moments algorithm (EMA) method for fitting the LP3 distribution to the logarithms of annual peak-discharge data. Additionally, a multiple Grubbs-Beck (MGB) test, a generalization of the test recommended in Bulletin 17B, was used for detecting multiple potentially influential low outliers in a flood series. The WLS regression found that no basin characteristics could explain the variability of standard deviation. Consequently, a constant regional standard deviation model was selected, resulting in a log-space value of 0.91 with a MSE of 0.03 log units. Yet drainage area was found to be statistically significant at explaining the site-to-site variability in mean. The linear WLS regional mean model based on drainage area had a Pseudo- 2 R of 51 percent and a MSE of 0.32 log units. The regional parameter estimates were then used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins. The final equations are functions of drainage area.Average standard errors of prediction for these regression equations range from 214.2 to 856.2 percent.

  7. Multiple concurrent recursive least squares identification with application to on-line spacecraft mass-property identification

    NASA Technical Reports Server (NTRS)

    Wilson, Edward (Inventor)

    2006-01-01

    The present invention is a method for identifying unknown parameters in a system having a set of governing equations describing its behavior that cannot be put into regression form with the unknown parameters linearly represented. In this method, the vector of unknown parameters is segmented into a plurality of groups where each individual group of unknown parameters may be isolated linearly by manipulation of said equations. Multiple concurrent and independent recursive least squares identification of each said group run, treating other unknown parameters appearing in their regression equation as if they were known perfectly, with said values provided by recursive least squares estimation from the other groups, thereby enabling the use of fast, compact, efficient linear algorithms to solve problems that would otherwise require nonlinear solution approaches. This invention is presented with application to identification of mass and thruster properties for a thruster-controlled spacecraft.

  8. Application of mathematical model methods for optimization tasks in construction materials technology

    NASA Astrophysics Data System (ADS)

    Fomina, E. V.; Kozhukhova, N. I.; Sverguzova, S. V.; Fomin, A. E.

    2018-05-01

    In this paper, the regression equations method for design of construction material was studied. Regression and polynomial equations representing the correlation between the studied parameters were proposed. The logic design and software interface of the regression equations method focused on parameter optimization to provide the energy saving effect at the stage of autoclave aerated concrete design considering the replacement of traditionally used quartz sand by coal mining by-product such as argillite. The mathematical model represented by a quadric polynomial for the design of experiment was obtained using calculated and experimental data. This allowed the estimation of relationship between the composition and final properties of the aerated concrete. The surface response graphically presented in a nomogram allowed the estimation of concrete properties in response to variation of composition within the x-space. The optimal range of argillite content was obtained leading to a reduction of raw materials demand, development of target plastic strength of aerated concrete as well as a reduction of curing time before autoclave treatment. Generally, this method allows the design of autoclave aerated concrete with required performance without additional resource and time costs.

  9. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery

    NASA Astrophysics Data System (ADS)

    Wu, Chaofan; Shen, Huanhuan; Shen, Aihua; Deng, Jinsong; Gan, Muye; Zhu, Jinxia; Xu, Hongwei; Wang, Ke

    2016-07-01

    Biomass is one significant biophysical parameter of a forest ecosystem, and accurate biomass estimation on the regional scale provides important information for carbon-cycle investigation and sustainable forest management. In this study, Landsat satellite imagery data combined with field-based measurements were integrated through comparisons of five regression approaches [stepwise linear regression, K-nearest neighbor, support vector regression, random forest (RF), and stochastic gradient boosting] with two different candidate variable strategies to implement the optimal spatial above-ground biomass (AGB) estimation. The results suggested that RF algorithm exhibited the best performance by 10-fold cross-validation with respect to R2 (0.63) and root-mean-square error (26.44 ton/ha). Consequently, the map of estimated AGB was generated with a mean value of 89.34 ton/ha in northwestern Zhejiang Province, China, with a similar pattern to the distribution mode of local forest species. This research indicates that machine-learning approaches associated with Landsat imagery provide an economical way for biomass estimation. Moreover, ensemble methods using all candidate variables, especially for Landsat images, provide an alternative for regional biomass simulation.

  10. Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio

    USGS Publications Warehouse

    Koltun, G.F.; Roberts, J.W.

    1990-01-01

    Multiple-regression equations are presented for estimating flood-peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at ungaged sites on rural, unregulated streams in Ohio. The average standard errors of prediction for the equations range from 33.4% to 41.4%. Peak discharge estimates determined by log-Pearson Type III analysis using data collected through the 1987 water year are reported for 275 streamflow-gaging stations. Ordinary least-squares multiple-regression techniques were used to divide the State into three regions and to identify a set of basin characteristics that help explain station-to- station variation in the log-Pearson estimates. Contributing drainage area, main-channel slope, and storage area were identified as suitable explanatory variables. Generalized least-square procedures, which include historical flow data and account for differences in the variance of flows at different gaging stations, spatial correlation among gaging station records, and variable lengths of station record were used to estimate the regression parameters. Weighted peak-discharge estimates computed as a function of the log-Pearson Type III and regression estimates are reported for each station. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site located on the same stream. Limitations and shortcomings cited in an earlier report on the magnitude and frequency of floods in Ohio are addressed in this study. Geographic bias is no longer evident for the Maumee River basin of northwestern Ohio. No bias is found to be associated with the forested-area characteristic for the range used in the regression analysis (0.0 to 99.0%), nor is this characteristic significant in explaining peak discharges. Surface-mined area likewise is not significant in explaining peak discharges, and the regression equations are not biased when applied to basins having approximately 30% or less surface-mined area. Analyses of residuals indicate that the equations tend to overestimate flood-peak discharges for basins having approximately 30% or more surface-mined area. (USGS)

  11. Using ridge regression in systematic pointing error corrections

    NASA Technical Reports Server (NTRS)

    Guiar, C. N.

    1988-01-01

    A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.

  12. Estimating physiological skin parameters from hyperspectral signatures

    NASA Astrophysics Data System (ADS)

    Vyas, Saurabh; Banerjee, Amit; Burlina, Philippe

    2013-05-01

    We describe an approach for estimating human skin parameters, such as melanosome concentration, collagen concentration, oxygen saturation, and blood volume, using hyperspectral radiometric measurements (signatures) obtained from in vivo skin. We use a computational model based on Kubelka-Munk theory and the Fresnel equations. This model forward maps the skin parameters to a corresponding multiband reflectance spectra. Machine-learning-based regression is used to generate the inverse map, and hence estimate skin parameters from hyperspectral signatures. We test our methods using synthetic and in vivo skin signatures obtained in the visible through the short wave infrared domains from 24 patients of both genders and Caucasian, Asian, and African American ethnicities. Performance validation shows promising results: good agreement with the ground truth and well-established physiological precepts. These methods have potential use in the characterization of skin abnormalities and in minimally-invasive prescreening of malignant skin cancers.

  13. System Identification Applied to Dynamic CFD Simulation and Wind Tunnel Data

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick C.; Klein, Vladislav; Frink, Neal T.; Vicroy, Dan D.

    2011-01-01

    Demanding aerodynamic modeling requirements for military and civilian aircraft have provided impetus for researchers to improve computational and experimental techniques. Model validation is a key component for these research endeavors so this study is an initial effort to extend conventional time history comparisons by comparing model parameter estimates and their standard errors using system identification methods. An aerodynamic model of an aircraft performing one-degree-of-freedom roll oscillatory motion about its body axes is developed. The model includes linear aerodynamics and deficiency function parameters characterizing an unsteady effect. For estimation of unknown parameters two techniques, harmonic analysis and two-step linear regression, were applied to roll-oscillatory wind tunnel data and to computational fluid dynamics (CFD) simulated data. The model used for this study is a highly swept wing unmanned aerial combat vehicle. Differences in response prediction, parameters estimates, and standard errors are compared and discussed

  14. Multiple regression for physiological data analysis: the problem of multicollinearity.

    PubMed

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  15. Effect of correlated observation error on parameters, predictions, and uncertainty

    USGS Publications Warehouse

    Tiedeman, Claire; Green, Christopher T.

    2013-01-01

    Correlations among observation errors are typically omitted when calculating observation weights for model calibration by inverse methods. We explore the effects of omitting these correlations on estimates of parameters, predictions, and uncertainties. First, we develop a new analytical expression for the difference in parameter variance estimated with and without error correlations for a simple one-parameter two-observation inverse model. Results indicate that omitting error correlations from both the weight matrix and the variance calculation can either increase or decrease the parameter variance, depending on the values of error correlation (ρ) and the ratio of dimensionless scaled sensitivities (rdss). For small ρ, the difference in variance is always small, but for large ρ, the difference varies widely depending on the sign and magnitude of rdss. Next, we consider a groundwater reactive transport model of denitrification with four parameters and correlated geochemical observation errors that are computed by an error-propagation approach that is new for hydrogeologic studies. We compare parameter estimates, predictions, and uncertainties obtained with and without the error correlations. Omitting the correlations modestly to substantially changes parameter estimates, and causes both increases and decreases of parameter variances, consistent with the analytical expression. Differences in predictions for the models calibrated with and without error correlations can be greater than parameter differences when both are considered relative to their respective confidence intervals. These results indicate that including observation error correlations in weighting for nonlinear regression can have important effects on parameter estimates, predictions, and their respective uncertainties.

  16. Measurement error and outcome distributions: Methodological issues in regression analyses of behavioral coding data.

    PubMed

    Holsclaw, Tracy; Hallgren, Kevin A; Steyvers, Mark; Smyth, Padhraic; Atkins, David C

    2015-12-01

    Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased Type I and Type II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in online supplemental materials. (c) 2016 APA, all rights reserved).

  17. Measurement error and outcome distributions: Methodological issues in regression analyses of behavioral coding data

    PubMed Central

    Holsclaw, Tracy; Hallgren, Kevin A.; Steyvers, Mark; Smyth, Padhraic; Atkins, David C.

    2015-01-01

    Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non-normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased type-I and type-II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally-technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in supplementary materials. PMID:26098126

  18. Regional regression models of watershed suspended-sediment discharge for the eastern United States

    NASA Astrophysics Data System (ADS)

    Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.

    2012-11-01

    SummaryEstimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1-8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.

  19. Regional regression models of watershed suspended-sediment discharge for the eastern United States

    USGS Publications Warehouse

    Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.

    2012-01-01

    Estimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1–8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.

  20. Estimation of Surface Seawater Fugacity of Carbon Dioxide Using Satellite Data and Machine Learning

    NASA Astrophysics Data System (ADS)

    Jang, E.; Im, J.; Park, G.; Park, Y.

    2016-12-01

    The ocean controls the climate of Earth by absorbing and releasing CO2 through the carbon cycle. The amount of CO2 in the ocean has increased since the industrial revolution. High CO2 concentration in the ocean has a negative influence to marine organisms and reduces the ability of absorbing CO2 in the ocean. This study estimated surface seawater fugacity of CO2 (fCO2) in the East Sea of Korea using Geostationary Ocean Color Imager (GOCI) and Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data, and Hybrid Coordinate Ocean Model (HYCOM) reanalysis data. GOCI is the world first geostationary ocean color observation satellite sensor, and it provides 8 images with 8 bands hourly per day from 9 am to 4 pm at 500m resolution. Two machine learning approaches (i.e., random forest and support vector regression) were used to model fCO2 in this study. While most of the existing studies used multiple linear regression to estimate the pressure of CO2 in the ocean, machine learning may handle more complex relationship between surface seawater fCO2 and ocean parameters in a dynamic spatiotemporal environment. Five ocean related parameters, colored dissolved organic matter (CDOM), chlorophyll-a (chla), sea surface temperature (SST), sea surface salinity (SSS), and mixed layer depth (MLD), were used as input variables. This study examined two schemes, one with GOCI-derived products and the other with MODIS-derived ones. Results show that random forest performed better than support vector regression regardless of satellite data used. The accuracy of GOCI-based estimation was higher than MODIS-based one, possibly thanks to the better spatiotemporal resolution of GOCI data. MLD was identified the most contributing parameter in estimating surface seawater fCO2 among the five ocean related parameters, which might be related with an active deep convection in the East Sea. The surface seawater fCO2 in summer was higher in general with some spatial variation than the other seasons because of higher SST.

  1. Random regression analyses using B-splines functions to model growth from birth to adult age in Canchim cattle.

    PubMed

    Baldi, F; Alencar, M M; Albuquerque, L G

    2010-12-01

    The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.

  2. Prediction models for clustered data: comparison of a random intercept and standard regression model

    PubMed Central

    2013-01-01

    Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436

  3. Prediction models for clustered data: comparison of a random intercept and standard regression model.

    PubMed

    Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

    2013-02-15

    When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.

  4. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  5. Spatiotemporal Bayesian analysis of Lyme disease in New York state, 1990-2000.

    PubMed

    Chen, Haiyan; Stratton, Howard H; Caraco, Thomas B; White, Dennis J

    2006-07-01

    Mapping ordinarily increases our understanding of nontrivial spatial and temporal heterogeneities in disease rates. However, the large number of parameters required by the corresponding statistical models often complicates detailed analysis. This study investigates the feasibility of a fully Bayesian hierarchical regression approach to the problem and identifies how it outperforms two more popular methods: crude rate estimates (CRE) and empirical Bayes standardization (EBS). In particular, we apply a fully Bayesian approach to the spatiotemporal analysis of Lyme disease incidence in New York state for the period 1990-2000. These results are compared with those obtained by CRE and EBS in Chen et al. (2005). We show that the fully Bayesian regression model not only gives more reliable estimates of disease rates than the other two approaches but also allows for tractable models that can accommodate more numerous sources of variation and unknown parameters.

  6. Novel point estimation from a semiparametric ratio estimator (SPRE): long-term health outcomes from short-term linear data, with application to weight loss in obesity.

    PubMed

    Weissman-Miller, Deborah

    2013-11-02

    Point estimation is particularly important in predicting weight loss in individuals or small groups. In this analysis, a new health response function is based on a model of human response over time to estimate long-term health outcomes from a change point in short-term linear regression. This important estimation capability is addressed for small groups and single-subject designs in pilot studies for clinical trials, medical and therapeutic clinical practice. These estimations are based on a change point given by parameters derived from short-term participant data in ordinary least squares (OLS) regression. The development of the change point in initial OLS data and the point estimations are given in a new semiparametric ratio estimator (SPRE) model. The new response function is taken as a ratio of two-parameter Weibull distributions times a prior outcome value that steps estimated outcomes forward in time, where the shape and scale parameters are estimated at the change point. The Weibull distributions used in this ratio are derived from a Kelvin model in mechanics taken here to represent human beings. A distinct feature of the SPRE model in this article is that initial treatment response for a small group or a single subject is reflected in long-term response to treatment. This model is applied to weight loss in obesity in a secondary analysis of data from a classic weight loss study, which has been selected due to the dramatic increase in obesity in the United States over the past 20 years. A very small relative error of estimated to test data is shown for obesity treatment with the weight loss medication phentermine or placebo for the test dataset. An application of SPRE in clinical medicine or occupational therapy is to estimate long-term weight loss for a single subject or a small group near the beginning of treatment.

  7. Inverse models: A necessary next step in ground-water modeling

    USGS Publications Warehouse

    Poeter, E.P.; Hill, M.C.

    1997-01-01

    Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.

  8. Random regression models using different functions to model test-day milk yield of Brazilian Holstein cows.

    PubMed

    Bignardi, A B; El Faro, L; Torres Júnior, R A A; Cardoso, V L; Machado, P F; Albuquerque, L G

    2011-10-31

    We analyzed 152,145 test-day records from 7317 first lactations of Holstein cows recorded from 1995 to 2003. Our objective was to model variations in test-day milk yield during the first lactation of Holstein cows by random regression model (RRM), using various functions in order to obtain adequate and parsimonious models for the estimation of genetic parameters. Test-day milk yields were grouped into weekly classes of days in milk, ranging from 1 to 44 weeks. The contemporary groups were defined as herd-test-day. The analyses were performed using a single-trait RRM, including the direct additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. The mean trend of milk yield was modeled with a fourth-order orthogonal Legendre polynomial. The additive genetic and permanent environmental covariance functions were estimated by random regression on two parametric functions, Ali and Schaeffer and Wilmink, and on B-spline functions of days in milk. The covariance components and the genetic parameters were estimated by the restricted maximum likelihood method. Results from RRM parametric and B-spline functions were compared to RRM on Legendre polynomials and with a multi-trait analysis, using the same data set. Heritability estimates presented similar trends during mid-lactation (13 to 31 weeks) and between week 37 and the end of lactation, for all RRM. Heritabilities obtained by multi-trait analysis were of a lower magnitude than those estimated by RRM. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. RRM using B-spline and Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data.

  9. Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh

    NASA Astrophysics Data System (ADS)

    Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.

    2017-12-01

    Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.

  10. Estimation of crown closure from AVIRIS data using regression analysis

    NASA Technical Reports Server (NTRS)

    Staenz, K.; Williams, D. J.; Truchon, M.; Fritz, R.

    1993-01-01

    Crown closure is one of the input parameters used for forest growth and yield modelling. Preliminary work by Staenz et al. indicates that imaging spectrometer data acquired with sensors such as the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) have some potential for estimating crown closure on a stand level. The objectives of this paper are: (1) to establish a relationship between AVIRIS data and the crown closure derived from aerial photography of a forested test site within the Interior Douglas Fir biogeoclimatic zone in British Columbia, Canada; (2) to investigate the impact of atmospheric effects and the forest background on the correlation between AVIRIS data and crown closure estimates; and (3) to improve this relationship using multiple regression analysis.

  11. Divergent estimation error in portfolio optimization and in linear regression

    NASA Astrophysics Data System (ADS)

    Kondor, I.; Varga-Haszonits, I.

    2008-08-01

    The problem of estimation error in portfolio optimization is discussed, in the limit where the portfolio size N and the sample size T go to infinity such that their ratio is fixed. The estimation error strongly depends on the ratio N/T and diverges for a critical value of this parameter. This divergence is the manifestation of an algorithmic phase transition, it is accompanied by a number of critical phenomena, and displays universality. As the structure of a large number of multidimensional regression and modelling problems is very similar to portfolio optimization, the scope of the above observations extends far beyond finance, and covers a large number of problems in operations research, machine learning, bioinformatics, medical science, economics, and technology.

  12. MIXOR: a computer program for mixed-effects ordinal regression analysis.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-03-01

    MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.

  13. Optimal estimation of suspended-sediment concentrations in streams

    USGS Publications Warehouse

    Holtschlag, D.J.

    2001-01-01

    Optimal estimators are developed for computation of suspended-sediment concentrations in streams. The estimators are a function of parameters, computed by use of generalized least squares, which simultaneously account for effects of streamflow, seasonal variations in average sediment concentrations, a dynamic error component, and the uncertainty in concentration measurements. The parameters are used in a Kalman filter for on-line estimation and an associated smoother for off-line estimation of suspended-sediment concentrations. The accuracies of the optimal estimators are compared with alternative time-averaging interpolators and flow-weighting regression estimators by use of long-term daily-mean suspended-sediment concentration and streamflow data from 10 sites within the United States. For sampling intervals from 3 to 48 days, the standard errors of on-line and off-line optimal estimators ranged from 52.7 to 107%, and from 39.5 to 93.0%, respectively. The corresponding standard errors of linear and cubic-spline interpolators ranged from 48.8 to 158%, and from 50.6 to 176%, respectively. The standard errors of simple and multiple regression estimators, which did not vary with the sampling interval, were 124 and 105%, respectively. Thus, the optimal off-line estimator (Kalman smoother) had the lowest error characteristics of those evaluated. Because suspended-sediment concentrations are typically measured at less than 3-day intervals, use of optimal estimators will likely result in significant improvements in the accuracy of continuous suspended-sediment concentration records. Additional research on the integration of direct suspended-sediment concentration measurements and optimal estimators applied at hourly or shorter intervals is needed.

  14. A Bayesian methodological framework for accommodating interannual variability of nutrient loading with the SPARROW model

    NASA Astrophysics Data System (ADS)

    Wellen, Christopher; Arhonditsis, George B.; Labencki, Tanya; Boyd, Duncan

    2012-10-01

    Regression-type, hybrid empirical/process-based models (e.g., SPARROW, PolFlow) have assumed a prominent role in efforts to estimate the sources and transport of nutrient pollution at river basin scales. However, almost no attempts have been made to explicitly accommodate interannual nutrient loading variability in their structure, despite empirical and theoretical evidence indicating that the associated source/sink processes are quite variable at annual timescales. In this study, we present two methodological approaches to accommodate interannual variability with the Spatially Referenced Regressions on Watershed attributes (SPARROW) nonlinear regression model. The first strategy uses the SPARROW model to estimate a static baseline load and climatic variables (e.g., precipitation) to drive the interannual variability. The second approach allows the source/sink processes within the SPARROW model to vary at annual timescales using dynamic parameter estimation techniques akin to those used in dynamic linear models. Model parameterization is founded upon Bayesian inference techniques that explicitly consider calibration data and model uncertainty. Our case study is the Hamilton Harbor watershed, a mixed agricultural and urban residential area located at the western end of Lake Ontario, Canada. Our analysis suggests that dynamic parameter estimation is the more parsimonious of the two strategies tested and can offer insights into the temporal structural changes associated with watershed functioning. Consistent with empirical and theoretical work, model estimated annual in-stream attenuation rates varied inversely with annual discharge. Estimated phosphorus source areas were concentrated near the receiving water body during years of high in-stream attenuation and dispersed along the main stems of the streams during years of low attenuation, suggesting that nutrient source areas are subject to interannual variability.

  15. Tropical forest plantation biomass estimation using RADARSAT-SAR and TM data of south china

    NASA Astrophysics Data System (ADS)

    Wang, Chenli; Niu, Zheng; Gu, Xiaoping; Guo, Zhixing; Cong, Pifu

    2005-10-01

    Forest biomass is one of the most important parameters for global carbon stock model yet can only be estimated with great uncertainties. Remote sensing, especially SAR data can offers the possibility of providing relatively accurate forest biomass estimations at a lower cost than inventory in study tropical forest. The goal of this research was to compare the sensitivity of forest biomass to Landsat TM and RADARSAT-SAR data and to assess the efficiency of NDVI, EVI and other vegetation indices in study forest biomass based on the field survey date and GIS in south china. Based on vegetation indices and factor analysis, multiple regression and neural networks were developed for biomass estimation for each species of the plantation. For each species, the better relationships between the biomass predicted and that measured from field survey was obtained with a neural network developed for the species. The relationship between predicted and measured biomass derived from vegetation indices differed between species. This study concludes that single band and many vegetation indices are weakly correlated with selected forest biomass. RADARSAT-SAR Backscatter coefficient has a relatively good logarithmic correlation with forest biomass, but neither TM spectral bands nor vegetation indices alone are sufficient to establish an efficient model for biomass estimation due to the saturation of bands and vegetation indices, multiple regression models that consist of spectral and environment variables improve biomass estimation performance. Comparing with TM, a relatively well estimation result can be achieved by RADARSAT-SAR, but all had limitations in tropical forest biomass estimation. The estimation results obtained are not accurate enough for forest management purposes at the forest stand level. However, the approximate volume estimates derived by the method can be useful in areas where no other forest information is available. Therefore, this paper provides a better understanding of relationships of remote sensing data and forest stand parameters used in forest parameter estimation models.

  16. Behavior of sensitivities in the one-dimensional advection-dispersion equation: Implications for parameter estimation and sampling design

    USGS Publications Warehouse

    Knopman, Debra S.; Voss, Clifford I.

    1987-01-01

    The spatial and temporal variability of sensitivities has a significant impact on parameter estimation and sampling design for studies of solute transport in porous media. Physical insight into the behavior of sensitivities is offered through an analysis of analytically derived sensitivities for the one-dimensional form of the advection-dispersion equation. When parameters are estimated in regression models of one-dimensional transport, the spatial and temporal variability in sensitivities influences variance and covariance of parameter estimates. Several principles account for the observed influence of sensitivities on parameter uncertainty. (1) Information about a physical parameter may be most accurately gained at points in space and time with a high sensitivity to the parameter. (2) As the distance of observation points from the upstream boundary increases, maximum sensitivity to velocity during passage of the solute front increases and the consequent estimate of velocity tends to have lower variance. (3) The frequency of sampling must be “in phase” with the S shape of the dispersion sensitivity curve to yield the most information on dispersion. (4) The sensitivity to the dispersion coefficient is usually at least an order of magnitude less than the sensitivity to velocity. (5) The assumed probability distribution of random error in observations of solute concentration determines the form of the sensitivities. (6) If variance in random error in observations is large, trends in sensitivities of observation points may be obscured by noise and thus have limited value in predicting variance in parameter estimates among designs. (7) Designs that minimize the variance of one parameter may not necessarily minimize the variance of other parameters. (8) The time and space interval over which an observation point is sensitive to a given parameter depends on the actual values of the parameters in the underlying physical system.

  17. Performance evaluation of spectral vegetation indices using a statistical sensitivity function

    USGS Publications Warehouse

    Ji, Lei; Peters, Albert J.

    2007-01-01

    A great number of spectral vegetation indices (VIs) have been developed to estimate biophysical parameters of vegetation. Traditional techniques for evaluating the performance of VIs are regression-based statistics, such as the coefficient of determination and root mean square error. These statistics, however, are not capable of quantifying the detailed relationship between VIs and biophysical parameters because the sensitivity of a VI is usually a function of the biophysical parameter instead of a constant. To better quantify this relationship, we developed a “sensitivity function” for measuring the sensitivity of a VI to biophysical parameters. The sensitivity function is defined as the first derivative of the regression function, divided by the standard error of the dependent variable prediction. The function elucidates the change in sensitivity over the range of the biophysical parameter. The Student's t- or z-statistic can be used to test the significance of VI sensitivity. Additionally, we developed a “relative sensitivity function” that compares the sensitivities of two VIs when the biophysical parameters are unavailable.

  18. Functional interaction-based nonlinear models with application to multiplatform genomics data.

    PubMed

    Davenport, Clemontina A; Maity, Arnab; Baladandayuthapani, Veerabhadran

    2018-05-07

    Functional regression allows for a scalar response to be dependent on a functional predictor; however, not much work has been done when a scalar exposure that interacts with the functional covariate is introduced. In this paper, we present 2 functional regression models that account for this interaction and propose 2 novel estimation procedures for the parameters in these models. These estimation methods allow for a noisy and/or sparsely observed functional covariate and are easily extended to generalized exponential family responses. We compute standard errors of our estimators, which allows for further statistical inference and hypothesis testing. We compare the performance of the proposed estimators to each other and to one found in the literature via simulation and demonstrate our methods using a real data example. Copyright © 2018 John Wiley & Sons, Ltd.

  19. Unmanned Aerial System (UAS)-based phenotyping of soybean using multi-sensor data fusion and extreme learning machine

    NASA Astrophysics Data System (ADS)

    Maimaitijiang, Maitiniyazi; Ghulam, Abduwasit; Sidike, Paheding; Hartling, Sean; Maimaitiyiming, Matthew; Peterson, Kyle; Shavers, Ethan; Fishman, Jack; Peterson, Jim; Kadam, Suhas; Burken, Joel; Fritschi, Felix

    2017-12-01

    Estimating crop biophysical and biochemical parameters with high accuracy at low-cost is imperative for high-throughput phenotyping in precision agriculture. Although fusion of data from multiple sensors is a common application in remote sensing, less is known on the contribution of low-cost RGB, multispectral and thermal sensors to rapid crop phenotyping. This is due to the fact that (1) simultaneous collection of multi-sensor data using satellites are rare and (2) multi-sensor data collected during a single flight have not been accessible until recent developments in Unmanned Aerial Systems (UASs) and UAS-friendly sensors that allow efficient information fusion. The objective of this study was to evaluate the power of high spatial resolution RGB, multispectral and thermal data fusion to estimate soybean (Glycine max) biochemical parameters including chlorophyll content and nitrogen concentration, and biophysical parameters including Leaf Area Index (LAI), above ground fresh and dry biomass. Multiple low-cost sensors integrated on UASs were used to collect RGB, multispectral, and thermal images throughout the growing season at a site established near Columbia, Missouri, USA. From these images, vegetation indices were extracted, a Crop Surface Model (CSM) was advanced, and a model to extract the vegetation fraction was developed. Then, spectral indices/features were combined to model and predict crop biophysical and biochemical parameters using Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), and Extreme Learning Machine based Regression (ELR) techniques. Results showed that: (1) For biochemical variable estimation, multispectral and thermal data fusion provided the best estimate for nitrogen concentration and chlorophyll (Chl) a content (RMSE of 9.9% and 17.1%, respectively) and RGB color information based indices and multispectral data fusion exhibited the largest RMSE 22.6%; the highest accuracy for Chl a + b content estimation was obtained by fusion of information from all three sensors with an RMSE of 11.6%. (2) Among the plant biophysical variables, LAI was best predicted by RGB and thermal data fusion while multispectral and thermal data fusion was found to be best for biomass estimation. (3) For estimation of the above mentioned plant traits of soybean from multi-sensor data fusion, ELR yields promising results compared to PLSR and SVR in this study. This research indicates that fusion of low-cost multiple sensor data within a machine learning framework can provide relatively accurate estimation of plant traits and provide valuable insight for high spatial precision in agriculture and plant stress assessment.

  20. The association of very-low-density lipoprotein with ankle-brachial index in peritoneal dialysis patients with controlled serum low-density lipoprotein cholesterol level

    PubMed Central

    2013-01-01

    Background Peripheral artery disease (PAD) represents atherosclerotic disease and is a risk factor for death in peritoneal dialysis (PD) patients, who tend to show an atherogenic lipid profile. In this study, we investigated the relationship between lipid profile and ankle-brachial index (ABI) as an index of atherosclerosis in PD patients with controlled serum low-density lipoprotein (LDL) cholesterol level. Methods Thirty-five PD patients, whose serum LDL cholesterol level was controlled at less than 120mg/dl, were enrolled in this cross-sectional study in Japan. The proportions of cholesterol level to total cholesterol level (cholesterol proportion) in 20 lipoprotein fractions and the mean size of lipoprotein particles were measured using an improved method, namely, high-performance gel permeation chromatography. Multivariate linear regression analysis was adjusted for diabetes mellitus and cardiovascular and/or cerebrovascular diseases. Results The mean (standard deviation) age was 61.6 (10.5) years; PD vintage, 38.5 (28.1) months; ABI, 1.07 (0.22). A low ABI (0.9 or lower) was observed in 7 patients (low-ABI group). The low-ABI group showed significantly higher cholesterol proportions in the chylomicron fraction and large very-low-density lipoproteins (VLDLs) (Fractions 3–5) than the high-ABI group (ABI>0.9). Adjusted multivariate linear regression analysis showed that ABI was negatively associated with serum VLDL cholesterol level (parameter estimate=-0.00566, p=0.0074); the cholesterol proportions in large VLDLs (Fraction 4, parameter estimate=-3.82, p=0.038; Fraction 5, parameter estimate=-3.62, p=0.0039) and medium VLDL (Fraction 6, parameter estimate=-3.25, p=0.014); and the size of VLDL particles (parameter estimate=-0.0352, p=0.032). Conclusions This study showed that the characteristics of VLDL particles were associated with ABI among PD patients. Lowering serum VLDL level may be an effective therapy against atherosclerosis in PD patients after the control of serum LDL cholesterol level. PMID:24093487

  1. Impact of an equality constraint on the class-specific residual variances in regression mixtures: A Monte Carlo simulation study.

    PubMed

    Kim, Minjung; Lamont, Andrea E; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M Lee

    2016-06-01

    Regression mixture models are a novel approach to modeling the heterogeneous effects of predictors on an outcome. In the model-building process, often residual variances are disregarded and simplifying assumptions are made without thorough examination of the consequences. In this simulation study, we investigated the impact of an equality constraint on the residual variances across latent classes. We examined the consequences of constraining the residual variances on class enumeration (finding the true number of latent classes) and on the parameter estimates, under a number of different simulation conditions meant to reflect the types of heterogeneity likely to exist in applied analyses. The results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted on the estimated class sizes and showed the potential to greatly affect the parameter estimates in each class. These results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions are made.

  2. VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

    PubMed Central

    Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

    2009-01-01

    We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190

  3. Estimating suspended sediment using acoustics in a fine-grained riverine system, Kickapoo Creek at Bloomington, Illinois

    USGS Publications Warehouse

    Manaster, Amanda D.; Domanski, Marian M.; Straub, Timothy D.; Boldt, Justin A.

    2016-08-18

    Acoustic technologies have the potential to be used as a surrogate for measuring suspended-sediment concentration (SSC). This potential was examined in a fine-grained (97-100 percent fines) riverine system in central Illinois by way of installation of an acoustic instrument. Acoustic data were collected continuously over the span of 5.5 years. Acoustic parameters were regressed against SSC data to determine the accuracy of using acoustic technology as a surrogate for measuring SSC in a fine-grained riverine system. The resulting regressions for SSC and sediment acoustic parameters had coefficients of determination ranging from 0.75 to 0.97 for various events and configurations. The overall Nash-Sutcliffe model-fit efficiency was 0.95 for the 132 observed and predicted SSC values determined using the sediment acoustic parameter regressions. The study of using acoustic technologies as a surrogate for measuring SSC in fine-grained riverine systems is ongoing. The results at this site are promising in the realm of surrogate technology.

  4. Automatic reconstruction of surge deposit thicknesses. Applications to some Italian volcanoes

    NASA Astrophysics Data System (ADS)

    Armienti, P.; Pareschi, M. T.

    1987-04-01

    The energy cone concept has been adopted to describe some kinds of surge deposits. The energy cone parameters (height and slope) are evaluated through a regression technique which utilizes deposit thicknesses and the correspondent quotes and heights of the energy cone. The regression also allows to evaluate a coefficient of proportionality linking the deposit thickness to the distance between topographic surface and energy line for a given eruption. Moreover, if an accurate topography is available (in this case a reconstruction of a digitalized topography of the Phlegrean Fields and of the Vesuvius), the energy cone parameters, obtained by the backfitted technique, can be used to evaluate the order of magnitude of the deposit volumes. The hazard map for a surge localized at the Solfatara (Phlegraean Fields, Naples) has been computed. The values of the energy cone parameters and the volume have been assumed to be equal to those estimated with the regression technique applied to a past surge eruption in the same area.

  5. Hyper-Spectral Image Analysis With Partially Latent Regression and Spatial Markov Dependencies

    NASA Astrophysics Data System (ADS)

    Deleforge, Antoine; Forbes, Florence; Ba, Sileye; Horaud, Radu

    2015-09-01

    Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-spectral observations. Within this framework, we propose a spatially-constrained and partially-latent regression method which maps high-dimensional inputs (hyper-spectral images) onto low-dimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locally-linear mappings (GLLiM) with a partially-latent response model. While the former makes high-dimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussian-mixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.

  6. Estimating equivalence with quantile regression

    USGS Publications Warehouse

    Cade, B.S.

    2011-01-01

    Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. ?? 2011 by the Ecological Society of America.

  7. Misspecification in Latent Change Score Models: Consequences for Parameter Estimation, Model Evaluation, and Predicting Change.

    PubMed

    Clark, D Angus; Nuttall, Amy K; Bowles, Ryan P

    2018-01-01

    Latent change score models (LCS) are conceptually powerful tools for analyzing longitudinal data (McArdle & Hamagami, 2001). However, applications of these models typically include constraints on key parameters over time. Although practically useful, strict invariance over time in these parameters is unlikely in real data. This study investigates the robustness of LCS when invariance over time is incorrectly imposed on key change-related parameters. Monte Carlo simulation methods were used to explore the impact of misspecification on parameter estimation, predicted trajectories of change, and model fit in the dual change score model, the foundational LCS. When constraints were incorrectly applied, several parameters, most notably the slope (i.e., constant change) factor mean and autoproportion coefficient, were severely and consistently biased, as were regression paths to the slope factor when external predictors of change were included. Standard fit indices indicated that the misspecified models fit well, partly because mean level trajectories over time were accurately captured. Loosening constraint improved the accuracy of parameter estimates, but estimates were more unstable, and models frequently failed to converge. Results suggest that potentially common sources of misspecification in LCS can produce distorted impressions of developmental processes, and that identifying and rectifying the situation is a challenge.

  8. Poster — Thur Eve — 44: Linearization of Compartmental Models for More Robust Estimates of Regional Hemodynamic, Metabolic and Functional Parameters using DCE-CT/PET Imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blais, AR; Dekaban, M; Lee, T-Y

    2014-08-15

    Quantitative analysis of dynamic positron emission tomography (PET) data usually involves minimizing a cost function with nonlinear regression, wherein the choice of starting parameter values and the presence of local minima affect the bias and variability of the estimated kinetic parameters. These nonlinear methods can also require lengthy computation time, making them unsuitable for use in clinical settings. Kinetic modeling of PET aims to estimate the rate parameter k{sub 3}, which is the binding affinity of the tracer to a biological process of interest and is highly susceptible to noise inherent in PET image acquisition. We have developed linearized kineticmore » models for kinetic analysis of dynamic contrast enhanced computed tomography (DCE-CT)/PET imaging, including a 2-compartment model for DCE-CT and a 3-compartment model for PET. Use of kinetic parameters estimated from DCE-CT can stabilize the kinetic analysis of dynamic PET data, allowing for more robust estimation of k{sub 3}. Furthermore, these linearized models are solved with a non-negative least squares algorithm and together they provide other advantages including: 1) only one possible solution and they do not require a choice of starting parameter values, 2) parameter estimates are comparable in accuracy to those from nonlinear models, 3) significantly reduced computational time. Our simulated data show that when blood volume and permeability are estimated with DCE-CT, the bias of k{sub 3} estimation with our linearized model is 1.97 ± 38.5% for 1,000 runs with a signal-to-noise ratio of 10. In summary, we have developed a computationally efficient technique for accurate estimation of k{sub 3} from noisy dynamic PET data.« less

  9. CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION.

    PubMed

    Wang, Lan; Kim, Yongdai; Li, Runze

    2013-10-01

    We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.

  10. CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION

    PubMed Central

    Wang, Lan; Kim, Yongdai; Li, Runze

    2014-01-01

    We investigate high-dimensional non-convex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis. PMID:24948843

  11. LAI-2000 Accuracy, Precision, and Application to Visual Estimation of Leaf Area Index of Loblolly Pine

    Treesearch

    Jason A. Gatch; Timothy B. Harrington; James P. Castleberry

    2002-01-01

    Leaf area index (LAI) is an important parameter of forest stand productivity that has been used to diagnose stand vigor and potential fertilizer response of southern pines. The LAI-2000 was tested for its ability to provide accurate and precise estimates of LAI of loblolly pine (Pinus taeda L.). To test instrument accuracy, regression was used to...

  12. Comparing the analytical performances of Micro-NIR and FT-NIR spectrometers in the evaluation of acerola fruit quality, using PLS and SVM regression algorithms.

    PubMed

    Malegori, Cristina; Nascimento Marques, Emanuel José; de Freitas, Sergio Tonetto; Pimentel, Maria Fernanda; Pasquini, Celio; Casiraghi, Ernestina

    2017-04-01

    The main goal of this study was to investigate the analytical performances of a state-of-the-art device, one of the smallest dispersion NIR spectrometers on the market (MicroNIR 1700), making a critical comparison with a benchtop FT-NIR spectrometer in the evaluation of the prediction accuracy. In particular, the aim of this study was to estimate in a non-destructive manner, titratable acidity and ascorbic acid content in acerola fruit during ripening, in a view of direct applicability in field of this new miniaturised handheld device. Acerola (Malpighia emarginata DC.) is a super-fruit characterised by a considerable amount of ascorbic acid, ranging from 1.0% to 4.5%. However, during ripening, acerola colour changes and the fruit may lose as much as half of its ascorbic acid content. Because the variability of chemical parameters followed a non-strictly linear profile, two different regression algorithms were compared: PLS and SVM. Regression models obtained with Micro-NIR spectra give better results using SVM algorithm, for both ascorbic acid and titratable acidity estimation. FT-NIR data give comparable results using both SVM and PLS algorithms, with lower errors for SVM regression. The prediction ability of the two instruments was statistically compared using the Passing-Bablok regression algorithm; the outcomes are critically discussed together with the regression models, showing the suitability of the portable Micro-NIR for in field monitoring of chemical parameters of interest in acerola fruits. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Parametric correlation functions to model the structure of permanent environmental (co)variances in milk yield random regression models.

    PubMed

    Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G

    2009-09-01

    The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.

  14. Random regression models on Legendre polynomials to estimate genetic parameters for weights from birth to adult age in Canchim cattle.

    PubMed

    Baldi, F; Albuquerque, L G; Alencar, M M

    2010-08-01

    The objective of this work was to estimate covariance functions for direct and maternal genetic effects, animal and maternal permanent environmental effects, and subsequently, to derive relevant genetic parameters for growth traits in Canchim cattle. Data comprised 49,011 weight records on 2435 females from birth to adult age. The model of analysis included fixed effects of contemporary groups (year and month of birth and at weighing) and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were allowed to vary and were modelled by a step function with 1, 4 or 11 classes based on animal's age. The model fitting four classes of residual variances was the best. A total of 12 random regression models from second to seventh order were used to model direct and maternal genetic effects, animal and maternal permanent environmental effects. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects fitted by quadric, cubic, quintic and linear Legendre polynomials, respectively, was the most adequate to describe the covariance structure of the data. Estimates of direct and maternal heritability obtained by multi-trait (seven traits) and random regression models were very similar. Selection for higher weight at any age, especially after weaning, will produce an increase in mature cow weight. The possibility to modify the growth curve in Canchim cattle to obtain animals with rapid growth at early ages and moderate to low mature cow weight is limited.

  15. Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts.

    PubMed

    Bouwman, Aniek C; Hayes, Ben J; Calus, Mario P L

    2017-10-30

    Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from - 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of - 0.8, while the mean squared error of prediction was minimized with a scaling parameter of - 1, i.e., RRcs. Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.

  16. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  17. Robust Methods for Moderation Analysis with a Two-Level Regression Model.

    PubMed

    Yang, Miao; Yuan, Ke-Hai

    2016-01-01

    Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.

  18. SEPARABLE FACTOR ANALYSIS WITH APPLICATIONS TO MORTALITY DATA

    PubMed Central

    Fosdick, Bailey K.; Hoff, Peter D.

    2014-01-01

    Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations. PMID:25489353

  19. Importance of Preserving Cross-correlation in developing Statistically Downscaled Climate Forcings and in estimating Land-surface Fluxes and States

    NASA Astrophysics Data System (ADS)

    Das Bhowmik, R.; Arumugam, S.

    2015-12-01

    Multivariate downscaling techniques exhibited superiority over univariate regression schemes in terms of preserving cross-correlations between multiple variables- precipitation and temperature - from GCMs. This study focuses on two aspects: (a) develop an analytical solutions on estimating biases in cross-correlations from univariate downscaling approaches and (b) quantify the uncertainty in land-surface states and fluxes due to biases in cross-correlations in downscaled climate forcings. Both these aspects are evaluated using climate forcings available from both historical climate simulations and CMIP5 hindcasts over the entire US. The analytical solution basically relates the univariate regression parameters, co-efficient of determination of regression and the co-variance ratio between GCM and downscaled values. The analytical solutions are compared with the downscaled univariate forcings by choosing the desired p-value (Type-1 error) in preserving the observed cross-correlation. . For quantifying the impacts of biases on cross-correlation on estimating streamflow and groundwater, we corrupt the downscaled climate forcings with different cross-correlation structure.

  20. Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

    PubMed

    Lin, Feng-Chang; Zhu, Jun

    2012-01-01

    We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.

  1. A proportional hazards regression model for the subdistribution with right-censored and left-truncated competing risks data

    PubMed Central

    Zhang, Xu; Zhang, Mei-Jie; Fine, Jason

    2012-01-01

    With competing risks failure time data, one often needs to assess the covariate effects on the cumulative incidence probabilities. Fine and Gray proposed a proportional hazards regression model to directly model the subdistribution of a competing risk. They developed the estimating procedure for right-censored competing risks data, based on the inverse probability of censoring weighting. Right-censored and left-truncated competing risks data sometimes occur in biomedical researches. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with right-censored and left-truncated data. We adopt a new weighting technique to estimate the parameters in this model. We have derived the large sample properties of the proposed estimators. To illustrate the application of the new method, we analyze the failure time data for children with acute leukemia. In this example, the failure times for children who had bone marrow transplants were left truncated. PMID:21557288

  2. Quantile Regression Models for Current Status Data

    PubMed Central

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-01-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307

  3. A baseline-free procedure for transformation models under interval censorship.

    PubMed

    Gu, Ming Gao; Sun, Liuquan; Zuo, Guoxin

    2005-12-01

    An important property of Cox regression model is that the estimation of regression parameters using the partial likelihood procedure does not depend on its baseline survival function. We call such a procedure baseline-free. Using marginal likelihood, we show that an baseline-free procedure can be derived for a class of general transformation models under interval censoring framework. The baseline-free procedure results a simplified and stable computation algorithm for some complicated and important semiparametric models, such as frailty models and heteroscedastic hazard/rank regression models, where the estimation procedures so far available involve estimation of the infinite dimensional baseline function. A detailed computational algorithm using Markov Chain Monte Carlo stochastic approximation is presented. The proposed procedure is demonstrated through extensive simulation studies, showing the validity of asymptotic consistency and normality. We also illustrate the procedure with a real data set from a study of breast cancer. A heuristic argument showing that the score function is a mean zero martingale is provided.

  4. On the adequacy of identified Cole Cole models

    NASA Astrophysics Data System (ADS)

    Xiang, Jianping; Cheng, Daizhan; Schlindwein, F. S.; Jones, N. B.

    2003-06-01

    The Cole-Cole model has been widely used to interpret electrical geophysical data. Normally an iterative computer program is used to invert the frequency domain complex impedance data and simple error estimation is obtained from the squared difference of the measured (field) and calculated values over the full frequency range. Recently a new direct inversion algorithm was proposed for the 'optimal' estimation of the Cole-Cole parameters, which differs from existing inversion algorithms in that the estimated parameters are direct solutions of a set of equations without the need for an initial guess for initialisation. This paper first briefly investigates the advantages and disadvantages of the new algorithm compared to the standard Levenberg-Marquardt "ridge regression" algorithm. Then, and more importantly, we address the adequacy of the models resulting from both the "ridge regression" and the new algorithm, using two different statistical tests and we give objective statistical criteria for acceptance or rejection of the estimated models. The first is the standard χ2 technique. The second is a parameter-accuracy based test that uses a joint multi-normal distribution. Numerical results that illustrate the performance of both testing methods are given. The main goals of this paper are (i) to provide the source code for the new ''direct inversion'' algorithm in Matlab and (ii) to introduce and demonstrate two methods to determine the reliability of a set of data before data processing, i.e., to consider the adequacy of the resulting Cole-Cole model.

  5. Genetic analyses of stillbirth in relation to litter size using random regression models.

    PubMed

    Chen, C Y; Misztal, I; Tsuruta, S; Herring, W O; Holl, J; Culbertson, M

    2010-12-01

    Estimates of genetic parameters for number of stillborns (NSB) in relation to litter size (LS) were obtained with random regression models (RRM). Data were collected from 4 purebred Duroc nucleus farms between 2004 and 2008. Two data sets with 6,575 litters for the first parity (P1) and 6,259 litters for the second to fifth parity (P2-5) with a total of 8,217 and 5,066 animals in the pedigree were analyzed separately. Number of stillborns was studied as a trait on sow level. Fixed effects were contemporary groups (farm-year-season) and fixed cubic regression coefficients on LS with Legendre polynomials. Models for P2-5 included the fixed effect of parity. Random effects were additive genetic effects for both data sets with permanent environmental effects included for P2-5. Random effects modeled with Legendre polynomials (RRM-L), linear splines (RRM-S), and degree 0 B-splines (RRM-BS) with regressions on LS were used. For P1, the order of polynomial, the number of knots, and the number of intervals used for respective models were quadratic, 3, and 3, respectively. For P2-5, the same parameters were linear, 2, and 2, respectively. Heterogeneous residual variances were considered in the models. For P1, estimates of heritability were 12 to 15%, 5 to 6%, and 6 to 7% in LS 5, 9, and 13, respectively. For P2-5, estimates were 15 to 17%, 4 to 5%, and 4 to 6% in LS 6, 9, and 12, respectively. For P1, average estimates of genetic correlations between LS 5 to 9, 5 to 13, and 9 to 13 were 0.53, -0.29, and 0.65, respectively. For P2-5, same estimates averaged for RRM-L and RRM-S were 0.75, -0.21, and 0.50, respectively. For RRM-BS with 2 intervals, the correlation was 0.66 between LS 5 to 7 and 8 to 13. Parameters obtained by 3 RRM revealed the nonlinear relationship between additive genetic effect of NSB and the environmental deviation of LS. The negative correlations between the 2 extreme LS might possibly indicate different genetic bases on incidence of stillbirth.

  6. Approximate median regression for complex survey data with skewed response.

    PubMed

    Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

    2016-12-01

    The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.

  7. Approximate Median Regression for Complex Survey Data with Skewed Response

    PubMed Central

    Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi

    2016-01-01

    Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562

  8. Three methods to construct predictive models using logistic regression and likelihood ratios to facilitate adjustment for pretest probability give similar results.

    PubMed

    Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les

    2008-01-01

    To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.

  9. Online quantitative analysis of multispectral images of human body tissues

    NASA Astrophysics Data System (ADS)

    Lisenko, S. A.

    2013-08-01

    A method is developed for online monitoring of structural and morphological parameters of biological tissues (haemoglobin concentration, degree of blood oxygenation, average diameter of capillaries and the parameter characterising the average size of tissue scatterers), which involves multispectral tissue imaging, image normalisation to one of its spectral layers and determination of unknown parameters based on their stable regression relation with the spectral characteristics of the normalised image. Regression is obtained by simulating numerically the diffuse reflectance spectrum of the tissue by the Monte Carlo method at a wide variation of model parameters. The correctness of the model calculations is confirmed by the good agreement with the experimental data. The error of the method is estimated under conditions of general variability of structural and morphological parameters of the tissue. The method developed is compared with the traditional methods of interpretation of multispectral images of biological tissues, based on the solution of the inverse problem for each pixel of the image in the approximation of different analytical models.

  10. Predicting root zone soil moisture with soil properties and satellite near-surface moisture data across the conterminous United States

    NASA Astrophysics Data System (ADS)

    Baldwin, D.; Manfreda, S.; Keller, K.; Smithwick, E. A. H.

    2017-03-01

    Satellite-based near-surface (0-2 cm) soil moisture estimates have global coverage, but do not capture variations of soil moisture in the root zone (up to 100 cm depth) and may be biased with respect to ground-based soil moisture measurements. Here, we present an ensemble Kalman filter (EnKF) hydrologic data assimilation system that predicts bias in satellite soil moisture data to support the physically based Soil Moisture Analytical Relationship (SMAR) infiltration model, which estimates root zone soil moisture with satellite soil moisture data. The SMAR-EnKF model estimates a regional-scale bias parameter using available in situ data. The regional bias parameter is added to satellite soil moisture retrievals before their use in the SMAR model, and the bias parameter is updated continuously over time with the EnKF algorithm. In this study, the SMAR-EnKF assimilates in situ soil moisture at 43 Soil Climate Analysis Network (SCAN) monitoring locations across the conterminous U.S. Multivariate regression models are developed to estimate SMAR parameters using soil physical properties and the moderate resolution imaging spectroradiometer (MODIS) evapotranspiration data product as covariates. SMAR-EnKF root zone soil moisture predictions are in relatively close agreement with in situ observations when using optimal model parameters, with root mean square errors averaging 0.051 [cm3 cm-3] (standard error, s.e. = 0.005). The average root mean square error associated with a 20-fold cross-validation analysis with permuted SMAR parameter regression models increases moderately (0.082 [cm3 cm-3], s.e. = 0.004). The expected regional-scale satellite correction bias is negative in four out of six ecoregions studied (mean = -0.12 [-], s.e. = 0.002), excluding the Great Plains and Eastern Temperate Forests (0.053 [-], s.e. = 0.001). With its capability of estimating regional-scale satellite bias, the SMAR-EnKF system can predict root zone soil moisture over broad extents and has applications in drought predictions and other operational hydrologic modeling purposes.

  11. Marginal regression analysis of recurrent events with coarsened censoring times.

    PubMed

    Hu, X Joan; Rosychuk, Rhonda J

    2016-12-01

    Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.

  12. Bootstrap Methods: A Very Leisurely Look.

    ERIC Educational Resources Information Center

    Hinkle, Dennis E.; Winstead, Wayland H.

    The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…

  13. Real-time soil sensing based on fiber optics and spectroscopy

    NASA Astrophysics Data System (ADS)

    Li, Minzan

    2005-08-01

    Using NIR spectroscopic techniques, correlation analysis and regression analysis for soil parameter estimation was conducted with raw soil samples collected in a cornfield and a forage field. Soil parameters analyzed were soil moisture, soil organic matter, nitrate nitrogen, soil electrical conductivity and pH. Results showed that all soil parameters could be evaluated by NIR spectral reflectance. For soil moisture, a linear regression model was available at low moisture contents below 30 % db, while an exponential model can be used in a wide range of moisture content up to 100 % db. Nitrate nitrogen estimation required a multi-spectral exponential model and electrical conductivity could be evaluated by a single spectral regression. According to the result above mentioned, a real time soil sensor system based on fiber optics and spectroscopy was developed. The sensor system was composed of a soil subsoiler with four optical fiber probes, a spectrometer, and a control unit. Two optical fiber probes were used for illumination and the other two optical fiber probes for collecting soil reflectance from visible to NIR wavebands at depths around 30 cm. The spectrometer was used to obtain the spectra of reflected lights. The control unit consisted of a data logging device, a personal computer, and a pulse generator. The experiment showed that clear photo-spectral reflectance was obtained from the underground soil. The soil reflectance was equal to that obtained by the desktop spectrophotometer in laboratory tests. Using the spectral reflectance, the soil parameters, such as soil moisture, pH, EC and SOM, were evaluated.

  14. Automation of workplace lifting hazard assessment for musculoskeletal injury prevention.

    PubMed

    Spector, June T; Lieblich, Max; Bao, Stephen; McQuade, Kevin; Hughes, Margaret

    2014-01-01

    Existing methods for practically evaluating musculoskeletal exposures such as posture and repetition in workplace settings have limitations. We aimed to automate the estimation of parameters in the revised United States National Institute for Occupational Safety and Health (NIOSH) lifting equation, a standard manual observational tool used to evaluate back injury risk related to lifting in workplace settings, using depth camera (Microsoft Kinect) and skeleton algorithm technology. A large dataset (approximately 22,000 frames, derived from six subjects) of simultaneous lifting and other motions recorded in a laboratory setting using the Kinect (Microsoft Corporation, Redmond, Washington, United States) and a standard optical motion capture system (Qualysis, Qualysis Motion Capture Systems, Qualysis AB, Sweden) was assembled. Error-correction regression models were developed to improve the accuracy of NIOSH lifting equation parameters estimated from the Kinect skeleton. Kinect-Qualysis errors were modelled using gradient boosted regression trees with a Huber loss function. Models were trained on data from all but one subject and tested on the excluded subject. Finally, models were tested on three lifting trials performed by subjects not involved in the generation of the model-building dataset. Error-correction appears to produce estimates for NIOSH lifting equation parameters that are more accurate than those derived from the Microsoft Kinect algorithm alone. Our error-correction models substantially decreased the variance of parameter errors. In general, the Kinect underestimated parameters, and modelling reduced this bias, particularly for more biased estimates. Use of the raw Kinect skeleton model tended to result in falsely high safe recommended weight limits of loads, whereas error-corrected models gave more conservative, protective estimates. Our results suggest that it may be possible to produce reasonable estimates of posture and temporal elements of tasks such as task frequency in an automated fashion, although these findings should be confirmed in a larger study. Further work is needed to incorporate force assessments and address workplace feasibility challenges. We anticipate that this approach could ultimately be used to perform large-scale musculoskeletal exposure assessment not only for research but also to provide real-time feedback to workers and employers during work method improvement activities and employee training.

  15. Multiple Regression Analysis to Estimate the Unit Price of Hanwoo (Bos taurus coreanae) Beef

    PubMed Central

    Hur, Sun-Jin

    2017-01-01

    This study were estimated the contribution of carcass traits to unit price, to analyze the marbling score as a categorical variable rather than a numerical variable, and to develop an optimal model that also includes the holiday effect and the raising period. The data for this study were acquired from the Quality Evaluation of the Korea Institute for Animal Products, and consisted of the trading records of 1,613,699 heads at 12 wholesale markets from 2010 to 2014. The unit price of a cow was estimated from the following parameters: −52.50 Won/mm, 8.93 Won/cm2, 7.20 Won/kg, and −1.04 Won/day for backfat thickness, eye muscle area, carcass weight, and raising period, respectively. Parameters for the dummy variables of marbling scores varied from 0 to 8328.74 Won/kg, which means that each marbling score grade had a different price value. The unit price of a steer was estimated from the following parameters: −92.12 Won/mm, 20.22 Won/cm2, 1.30 Won/kg, and −1.72 Won/day for backfat thickness, eye muscle area, carcass weights, and raising period, respectively. Parameters for dummy variables of marbling scores varied from 0 to 7338.80 Won/kg, which means that the grades of each marbling score had different price values. The unit price of sales during traditional holidays was significantly higher (827.71 Won/kg for cows, and 645.15 Won/kg for steers) than during non-holidays.We conclude that the use of categorical values for marbling scores would be needed to evaluate the price of Hanwoo beef using multiple regression analysis based on carcass traits and environmental factors. PMID:29147089

  16. Statistical guides to estimating the number of undiscovered mineral deposits: an example with porphyry copper deposits

    USGS Publications Warehouse

    Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.

    2005-01-01

    Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes. 

  17. Construction cost estimation of spherical storage tanks: artificial neural networks and hybrid regression—GA algorithms

    NASA Astrophysics Data System (ADS)

    Arabzadeh, Vida; Niaki, S. T. A.; Arabzadeh, Vahid

    2017-10-01

    One of the most important processes in the early stages of construction projects is to estimate the cost involved. This process involves a wide range of uncertainties, which make it a challenging task. Because of unknown issues, using the experience of the experts or looking for similar cases are the conventional methods to deal with cost estimation. The current study presents data-driven methods for cost estimation based on the application of artificial neural network (ANN) and regression models. The learning algorithms of the ANN are the Levenberg-Marquardt and the Bayesian regulated. Moreover, regression models are hybridized with a genetic algorithm to obtain better estimates of the coefficients. The methods are applied in a real case, where the input parameters of the models are assigned based on the key issues involved in a spherical tank construction. The results reveal that while a high correlation between the estimated cost and the real cost exists; both ANNs could perform better than the hybridized regression models. In addition, the ANN with the Levenberg-Marquardt learning algorithm (LMNN) obtains a better estimation than the ANN with the Bayesian-regulated learning algorithm (BRNN). The correlation between real data and estimated values is over 90%, while the mean square error is achieved around 0.4. The proposed LMNN model can be effective to reduce uncertainty and complexity in the early stages of the construction project.

  18. Parametric system identification of catamaran for improving controller design

    NASA Astrophysics Data System (ADS)

    Timpitak, Surasak; Prempraneerach, Pradya; Pengwang, Eakkachai

    2018-01-01

    This paper presents an estimation of simplified dynamic model for only surge- and yaw- motions of catamaran by using system identification (SI) techniques to determine associated unknown parameters. These methods will enhance the performance of designing processes for the motion control system of Unmanned Surface Vehicle (USV). The simulation results demonstrate an effective way to solve for damping forces and to determine added masses by applying least-square and AutoRegressive Exogenous (ARX) methods. Both methods are then evaluated according to estimated parametric errors from the vehicle’s dynamic model. The ARX method, which yields better estimated accuracy, can then be applied to identify unknown parameters as well as to help improving a controller design of a real unmanned catamaran.

  19. Parameter estimation of an ARMA model for river flow forecasting using goal programming

    NASA Astrophysics Data System (ADS)

    Mohammadi, Kourosh; Eslami, H. R.; Kahawita, Rene

    2006-11-01

    SummaryRiver flow forecasting constitutes one of the most important applications in hydrology. Several methods have been developed for this purpose and one of the most famous techniques is the Auto regressive moving average (ARMA) model. In the research reported here, the goal was to minimize the error for a specific season of the year as well as for the complete series. Goal programming (GP) was used to estimate the ARMA model parameters. Shaloo Bridge station on the Karun River with 68 years of observed stream flow data was selected to evaluate the performance of the proposed method. The results when compared with the usual method of maximum likelihood estimation were favorable with respect to the new proposed algorithm.

  20. Evaluation of AUC(0-4) predictive methods for cyclosporine in kidney transplant patients.

    PubMed

    Aoyama, Takahiko; Matsumoto, Yoshiaki; Shimizu, Makiko; Fukuoka, Masamichi; Kimura, Toshimi; Kokubun, Hideya; Yoshida, Kazunari; Yago, Kazuo

    2005-05-01

    Cyclosporine (CyA) is the most commonly used immunosuppressive agent in patients who undergo kidney transplantation. Dosage adjustment of CyA is usually based on trough levels. Recently, trough levels have been replacing the area under the concentration-time curve during the first 4 h after CyA administration (AUC(0-4)). The aim of this study was to compare the predictive values obtained using three different methods of AUC(0-4) monitoring. AUC(0-4) was calculated from 0 to 4 h in early and stable renal transplant patients using the trapezoidal rule. The predicted AUC(0-4) was calculated using three different methods: the multiple regression equation reported by Uchida et al.; Bayesian estimation for modified population pharmacokinetic parameters reported by Yoshida et al.; and modified population pharmacokinetic parameters reported by Cremers et al. The predicted AUC(0-4) was assessed on the basis of predictive bias, precision, and correlation coefficient. The predicted AUC(0-4) values obtained using three methods through measurement of three blood samples showed small differences in predictive bias, precision, and correlation coefficient. In the prediction of AUC(0-4) measurement of one blood sample from stable renal transplant patients, the performance of the regression equation reported by Uchida depended on sampling time. On the other hand, the performance of Bayesian estimation with modified pharmacokinetic parameters reported by Yoshida through measurement of one blood sample, which is not dependent on sampling time, showed a small difference in the correlation coefficient. The prediction of AUC(0-4) using a regression equation required accurate sampling time. In this study, the prediction of AUC(0-4) using Bayesian estimation did not require accurate sampling time in the AUC(0-4) monitoring of CyA. Thus Bayesian estimation is assumed to be clinically useful in the dosage adjustment of CyA.

  1. Estimation of Filling and Afterload Conditions by Pump Intrinsic Parameters in a Pulsatile Total Artificial Heart.

    PubMed

    Cuenca-Navalon, Elena; Laumen, Marco; Finocchiaro, Thomas; Steinseifer, Ulrich

    2016-07-01

    A physiological control algorithm is being developed to ensure an optimal physiological interaction between the ReinHeart total artificial heart (TAH) and the circulatory system. A key factor for that is the long-term, accurate determination of the hemodynamic state of the cardiovascular system. This study presents a method to determine estimation models for predicting hemodynamic parameters (pump chamber filling and afterload) from both left and right cardiovascular circulations. The estimation models are based on linear regression models that correlate filling and afterload values with pump intrinsic parameters derived from measured values of motor current and piston position. Predictions for filling lie in average within 5% from actual values, predictions for systemic afterload (AoPmean , AoPsys ) and mean pulmonary afterload (PAPmean ) lie in average within 9% from actual values. Predictions for systolic pulmonary afterload (PAPsys ) present an average deviation of 14%. The estimation models show satisfactory prediction and confidence intervals and are thus suitable to estimate hemodynamic parameters. This method and derived estimation models are a valuable alternative to implanted sensors and are an essential step for the development of a physiological control algorithm for a fully implantable TAH. Copyright © 2015 International Center for Artificial Organs and Transplantation and Wiley Periodicals, Inc.

  2. Estimates of the atmospheric parameters of M-type stars: a machine-learning perspective

    NASA Astrophysics Data System (ADS)

    Sarro, L. M.; Ordieres-Meré, J.; Bello-García, A.; González-Marcos, A.; Solano, E.

    2018-05-01

    Estimating the atmospheric parameters of M-type stars has been a difficult task due to the lack of simple diagnostics in the stellar spectra. We aim at uncovering good sets of predictive features of stellar atmospheric parameters (Teff, log (g), [M/H]) in spectra of M-type stars. We define two types of potential features (equivalent widths and integrated flux ratios) able to explain the atmospheric physical parameters. We search the space of feature sets using a genetic algorithm that evaluates solutions by their prediction performance in the framework of the BT-Settl library of stellar spectra. Thereafter, we construct eight regression models using different machine-learning techniques and compare their performances with those obtained using the classical χ2 approach and independent component analysis (ICA) coefficients. Finally, we validate the various alternatives using two sets of real spectra from the NASA Infrared Telescope Facility (IRTF) and Dwarf Archives collections. We find that the cross-validation errors are poor measures of the performance of regression models in the context of physical parameter prediction in M-type stars. For R ˜ 2000 spectra with signal-to-noise ratios typical of the IRTF and Dwarf Archives, feature selection with genetic algorithms or alternative techniques produces only marginal advantages with respect to representation spaces that are unconstrained in wavelength (full spectrum or ICA). We make available the atmospheric parameters for the two collections of observed spectra as online material.

  3. Estimation of trabecular bone parameters in children from multisequence MRI using texture-based regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lekadir, Karim, E-mail: karim.lekadir@upf.edu; Hoogendoorn, Corné; Armitage, Paul

    Purpose: This paper presents a statistical approach for the prediction of trabecular bone parameters from low-resolution multisequence magnetic resonance imaging (MRI) in children, thus addressing the limitations of high-resolution modalities such as HR-pQCT, including the significant exposure of young patients to radiation and the limited applicability of such modalities to peripheral bones in vivo. Methods: A statistical predictive model is constructed from a database of MRI and HR-pQCT datasets, to relate the low-resolution MRI appearance in the cancellous bone to the trabecular parameters extracted from the high-resolution images. The description of the MRI appearance is achieved between subjects by usingmore » a collection of feature descriptors, which describe the texture properties inside the cancellous bone, and which are invariant to the geometry and size of the trabecular areas. The predictive model is built by fitting to the training data a nonlinear partial least square regression between the input MRI features and the output trabecular parameters. Results: Detailed validation based on a sample of 96 datasets shows correlations >0.7 between the trabecular parameters predicted from low-resolution multisequence MRI based on the proposed statistical model and the values extracted from high-resolution HRp-QCT. Conclusions: The obtained results indicate the promise of the proposed predictive technique for the estimation of trabecular parameters in children from multisequence MRI, thus reducing the need for high-resolution radiation-based scans for a fragile population that is under development and growth.« less

  4. Marginalized zero-inflated negative binomial regression with application to dental caries

    PubMed Central

    Preisser, John S.; Das, Kalyan; Long, D. Leann; Divaris, Kimon

    2015-01-01

    The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the finite sample performance of MZINB is compared to marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied in the evaluation of a school-based fluoride mouthrinse program on dental caries in 677 children. PMID:26568034

  5. First-order kinetic gas generation model parameters for wet landfills.

    PubMed

    Faour, Ayman A; Reinhart, Debra R; You, Huaxin

    2007-01-01

    Landfill gas collection data from wet landfill cells were analyzed and first-order gas generation model parameters were estimated for the US EPA landfill gas emissions model (LandGEM). Parameters were determined through statistical comparison of predicted and actual gas collection. The US EPA LandGEM model appeared to fit the data well, provided it is preceded by a lag phase, which on average was 1.5 years. The first-order reaction rate constant, k, and the methane generation potential, L(o), were estimated for a set of landfills with short-term waste placement and long-term gas collection data. Mean and 95% confidence parameter estimates for these data sets were found using mixed-effects model regression followed by bootstrap analysis. The mean values for the specific methane volume produced during the lag phase (V(sto)), L(o), and k were 33 m(3)/Megagrams (Mg), 76 m(3)/Mg, and 0.28 year(-1), respectively. Parameters were also estimated for three full scale wet landfills where waste was placed over many years. The k and L(o) estimated for these landfills were 0.21 year(-1), 115 m(3)/Mg, 0.11 year(-1), 95 m(3)/Mg, and 0.12 year(-1) and 87 m(3)/Mg, respectively. A group of data points from wet landfills cells with short-term data were also analyzed. A conservative set of parameter estimates was suggested based on the upper 95% confidence interval parameters as a k of 0.3 year(-1) and a L(o) of 100 m(3)/Mg if design is optimized and the lag is minimized.

  6. Multispectral imaging of absorption and scattering properties of in vivo exposed rat brain using a digital red-green-blue camera.

    PubMed

    Yoshida, Keiichiro; Nishidate, Izumi; Ishizuka, Tomohiro; Kawauchi, Satoko; Sato, Shunichi; Sato, Manabu

    2015-05-01

    In order to estimate multispectral images of the absorption and scattering properties in the cerebral cortex of in vivo rat brain, we investigated spectral reflectance images estimated by the Wiener estimation method using a digital RGB camera. A Monte Carlo simulation-based multiple regression analysis for the corresponding spectral absorbance images at nine wavelengths (500, 520, 540, 560, 570, 580, 600, 730, and 760 nm) was then used to specify the absorption and scattering parameters of brain tissue. In this analysis, the concentrations of oxygenated hemoglobin and that of deoxygenated hemoglobin were estimated as the absorption parameters, whereas the coefficient a and the exponent b of the reduced scattering coefficient spectrum approximated by a power law function were estimated as the scattering parameters. The spectra of absorption and reduced scattering coefficients were reconstructed from the absorption and scattering parameters, and the spectral images of absorption and reduced scattering coefficients were then estimated. In order to confirm the feasibility of this method, we performed in vivo experiments on exposed rat brain. The estimated images of the absorption coefficients were dominated by the spectral characteristics of hemoglobin. The estimated spectral images of the reduced scattering coefficients had a broad scattering spectrum, exhibiting a larger magnitude at shorter wavelengths, corresponding to the typical spectrum of brain tissue published in the literature. The changes in the estimated absorption and scattering parameters during normoxia, hyperoxia, and anoxia indicate the potential applicability of the method by which to evaluate the pathophysiological conditions of in vivo brain due to the loss of tissue viability.

  7. A support vector regression-firefly algorithm-based model for limiting velocity prediction in sewer pipes.

    PubMed

    Ebtehaj, Isa; Bonakdari, Hossein

    2016-01-01

    Sediment transport without deposition is an essential consideration in the optimum design of sewer pipes. In this study, a novel method based on a combination of support vector regression (SVR) and the firefly algorithm (FFA) is proposed to predict the minimum velocity required to avoid sediment settling in pipe channels, which is expressed as the densimetric Froude number (Fr). The efficiency of support vector machine (SVM) models depends on the suitable selection of SVM parameters. In this particular study, FFA is used by determining these SVM parameters. The actual effective parameters on Fr calculation are generally identified by employing dimensional analysis. The different dimensionless variables along with the models are introduced. The best performance is attributed to the model that employs the sediment volumetric concentration (C(V)), ratio of relative median diameter of particles to hydraulic radius (d/R), dimensionless particle number (D(gr)) and overall sediment friction factor (λ(s)) parameters to estimate Fr. The performance of the SVR-FFA model is compared with genetic programming, artificial neural network and existing regression-based equations. The results indicate the superior performance of SVR-FFA (mean absolute percentage error = 2.123%; root mean square error =0.116) compared with other methods.

  8. Losses to single-family housing from ground motions in the 1994 Northridge, California, earthquake

    USGS Publications Warehouse

    Wesson, R.L.; Perkins, D.M.; Leyendecker, E.V.; Roth, R.J.; Petersen, M.D.

    2004-01-01

    The distributions of insured losses to single-family housing following the 1994 Northridge, California, earthquake for 234 ZIP codes can be satisfactorily modeled with gamma distributions. Regressions of the parameters in the gamma distribution on estimates of ground motion, derived from ShakeMap estimates or from interpolated observations, provide a basis for developing curves of conditional probability of loss given a ground motion. Comparison of the resulting estimates of aggregate loss with the actual aggregate loss gives satisfactory agreement for several different ground-motion parameters. Estimates of loss based on a deterministic spatial model of the earthquake ground motion, using standard attenuation relationships and NEHRP soil factors, give satisfactory results for some ground-motion parameters if the input ground motions are increased about one and one-half standard deviations above the median, reflecting the fact that the ground motions for the Northridge earthquake tended to be higher than the median ground motion for other earthquakes with similar magnitude. The results give promise for making estimates of insured losses to a similar building stock under future earthquake loading. ?? 2004, Earthquake Engineering Research Institute.

  9. Analysis and selection of magnitude relations for the Working Group on Utah Earthquake Probabilities

    USGS Publications Warehouse

    Duross, Christopher; Olig, Susan; Schwartz, David

    2015-01-01

    Prior to calculating time-independent and -dependent earthquake probabilities for faults in the Wasatch Front region, the Working Group on Utah Earthquake Probabilities (WGUEP) updated a seismic-source model for the region (Wong and others, 2014) and evaluated 19 historical regressions on earthquake magnitude (M). These regressions relate M to fault parameters for historical surface-faulting earthquakes, including linear fault length (e.g., surface-rupture length [SRL] or segment length), average displacement, maximum displacement, rupture area, seismic moment (Mo ), and slip rate. These regressions show that significant epistemic uncertainties complicate the determination of characteristic magnitude for fault sources in the Basin and Range Province (BRP). For example, we found that M estimates (as a function of SRL) span about 0.3–0.4 units (figure 1) owing to differences in the fault parameter used; age, quality, and size of historical earthquake databases; and fault type and region considered.

  10. Consistent model identification of varying coefficient quantile regression with BIC tuning parameter selection

    PubMed Central

    Zheng, Qi; Peng, Limin

    2016-01-01

    Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212

  11. Preliminary Estimation of Kappa Parameter in Croatia

    NASA Astrophysics Data System (ADS)

    Stanko, Davor; Markušić, Snježana; Ivančić, Ines; Mario, Gazdek; Gülerce, Zeynep

    2017-12-01

    Spectral parameter kappa κ is used to describe spectral amplitude decay “crash syndrome” at high frequencies. The purpose of this research is to estimate spectral parameter kappa for the first time in Croatia based on small and moderate earthquakes. Recordings of local earthquakes with magnitudes higher than 3, epicentre distances less than 150 km, and focal depths less than 30 km from seismological stations in Croatia are used. The value of kappa was estimated from the acceleration amplitude spectrum of shear waves from the slope of the high-frequency part where the spectrum starts to decay rapidly to a noise floor. Kappa models as a function of a site and distance were derived from a standard linear regression of kappa-distance dependence. Site kappa was determined from the extrapolation of the regression line to a zero distance. The preliminary results of site kappa across Croatia are promising. In this research, these results are compared with local site condition parameters for each station, e.g. shear wave velocity in the upper 30 m from geophysical measurements and with existing global shear wave velocity - site kappa values. Spatial distribution of individual kappa’s is compared with the azimuthal distribution of earthquake epicentres. These results are significant for a couple of reasons: to extend the knowledge of the attenuation of near-surface crust layers of the Dinarides and to provide additional information on the local earthquake parameters for updating seismic hazard maps of studied area. Site kappa can be used in the re-creation, and re-calibration of attenuation of peak horizontal and/or vertical acceleration in the Dinarides area since information on the local site conditions were not included in the previous studies.

  12. Robust inference under the beta regression model with application to health care studies.

    PubMed

    Ghosh, Abhik

    2017-01-01

    Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.

  13. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    PubMed

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  14. A new method for the production of social fragility functions and the result of its use in worldwide fatality loss estimation for earthquakes

    NASA Astrophysics Data System (ADS)

    Daniell, James; Wenzel, Friedemann

    2014-05-01

    A review of over 200 fatality models over the past 50 years for earthquake loss estimation from various authors has identified key parameters that influence fatality estimation in each of these models. These are often very specific and cannot be readily adapted globally. In the doctoral dissertation of the author, a new method is used for regression of fatalities to intensity using loss functions based not only on fatalities, but also using population models and other socioeconomic parameters created through time for every country worldwide for the period 1900-2013. A calibration of functions was undertaken from 1900-2008, and each individual quake analysed from 2009-2013 in real-time, in conjunction with www.earthquake-report.com. Using the CATDAT Damaging Earthquakes Database containing socioeconomic loss information for 7208 damaging earthquake events from 1900-2013 including disaggregation of secondary effects, fatality estimates for over 2035 events have been re-examined from 1900-2013. In addition, 99 of these events have detailed data for the individual cities and towns or have been reconstructed to create a death rate as a percentage of population. Many historical isoseismal maps and macroseismic intensity datapoint surveys collected globally, have been digitised and modelled covering around 1353 of these 2035 fatal events, to include an estimate of population, occupancy and socioeconomic climate at the time of the event at each intensity bracket. In addition, 1651 events without fatalities but causing damage have also been examined in this way. The production of socioeconomic and engineering indices such as HDI and building vulnerability has been undertaken on a country-level and state/province-level leading to a dataset allowing regressions not only using a static view of risk, but also allowing for the change in the socioeconomic climate between the earthquake events to be undertaken. This means that a year 1920 event in a country, will not simply be regressed against a year 2000 event, but normalised. A global human development index (HDI) (life expectancy, education and income) was developed and collected for the first time from 1900-2013 globally on a country and province level allowing for a very useful parameter in the regression. In addition, the occupancy rate from the time of day that the event occurred, as well as population density and individual earthquake attributes like the existence of a foreshock were also examined for the 3004 events in the regression analysis. Where an event has not occurred in a country previously, a regionalisation strategy based on building typologies, seismic code index, building practice, climate, earthquake history and socioeconomic climate is proposed. The result is a set of "social fragility functions" calculating fatalities for use in any country worldwide using the parameters of macroseismic intensity, population, HDI, time of day and occupancy, that provide a robust accurate method, which has not only been calibrated to country level data but to town and city data through time. The estimates will continue to be used in conjunction with Earthquake Report, a non-profit worldwide earthquake reporting website and has shown very promising results from 2010-2013 for rapid estimates of fatalities globally.

  15. Identification of internal properties of fibers and micro-swimmers

    NASA Astrophysics Data System (ADS)

    Plouraboue, Franck; Thiam, Ibrahima; Delmotte, Blaise; Climent, Eric; PSC Collaboration

    2016-11-01

    In this presentation we discuss the identifiability of constitutive parameters of passive or active micro-swimmers. We first present a general framework for describing fibers or micro-swimmers using a bead-model description. Using a kinematic constraint formulation to describe fibers, flagellum or cilia, we find explicit linear relationship between elastic constitutive parameters and generalised velocities from computing contact forces. This linear formulation then permits to address explicitly identifiability conditions and solve for parameter identification. We show that both active forcing and passive parameters are both identifiable independently but not simultaneously. We also provide unbiased estimators for elastic parameters as well as active ones in the presence of Langevin-like forcing with Gaussian noise using normal linear regression models and maximum likelihood method. These theoretical results are illustrated in various configurations of relaxed or actuated passives fibers, and active filament of known passive properties, showing the efficiency of the proposed approach for direct parameter identification. The convergence of the proposed estimators is successfully tested numerically.

  16. Pseudo and conditional score approach to joint analysis of current count and current status data.

    PubMed

    Wen, Chi-Chung; Chen, Yi-Hau

    2018-04-17

    We develop a joint analysis approach for recurrent and nonrecurrent event processes subject to case I interval censorship, which are also known in literature as current count and current status data, respectively. We use a shared frailty to link the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty fully unspecified. Conditional on the frailty, the recurrent event is assumed to follow a nonhomogeneous Poisson process, and the mean function of the recurrent event and the survival function of the nonrecurrent event are assumed to follow some general form of semiparametric transformation models. Estimation of the models is based on the pseudo-likelihood and the conditional score techniques. The resulting estimators for the regression parameters and the unspecified baseline functions are shown to be consistent with rates of square and cubic roots of the sample size, respectively. Asymptotic normality with closed-form asymptotic variance is derived for the estimator of the regression parameters. We apply the proposed method to a fracture-osteoporosis survey data to identify risk factors jointly for fracture and osteoporosis in elders, while accounting for association between the two events within a subject. © 2018, The International Biometric Society.

  17. A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis Revision 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Narlesky, Joshua Edward; Kelly, Elizabeth J.

    2015-09-10

    This report documents the new PG calibration regression equation. These calibration equations incorporate new data that have become available since revision 1 of “A Calibration to Predict the Concentrations of Impurities in Plutonium Oxide by Prompt Gamma Analysis” was issued [3] The calibration equations are based on a weighted least squares (WLS) approach for the regression. The WLS method gives each data point its proper amount of influence over the parameter estimates. This gives two big advantages, more precise parameter estimates and better and more defensible estimates of uncertainties. The WLS approach makes sense both statistically and experimentally because themore » variances increase with concentration, and there are physical reasons that the higher measurements are less reliable and should be less influential. The new magnesium calibration includes a correction for sodium and separate calibration equation for items with and without chlorine. These additional calibration equations allow for better predictions and smaller uncertainties for sodium in materials with and without chlorine. Chlorine and sodium have separate equations for RICH materials. Again, these equations give better predictions and smaller uncertainties chlorine and sodium for RICH materials.« less

  18. Bayesian logistic regression approaches to predict incorrect DRG assignment.

    PubMed

    Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural

    2018-05-07

    Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.

  19. Estimation of genetic parameters and breeding values across challenged environments to select for robust pigs.

    PubMed

    Herrero-Medrano, J M; Mathur, P K; ten Napel, J; Rashidi, H; Alexandri, P; Knol, E F; Mulder, H A

    2015-04-01

    Robustness is an important issue in the pig production industry. Since pigs from international breeding organizations have to withstand a variety of environmental challenges, selection of pigs with the inherent ability to sustain their productivity in diverse environments may be an economically feasible approach in the livestock industry. The objective of this study was to estimate genetic parameters and breeding values across different levels of environmental challenge load. The challenge load (CL) was estimated as the reduction in reproductive performance during different weeks of a year using 925,711 farrowing records from farms distributed worldwide. A wide range of levels of challenge, from favorable to unfavorable environments, was observed among farms with high CL values being associated with confirmed situations of unfavorable environment. Genetic parameters and breeding values were estimated in high- and low-challenge environments using a bivariate analysis, as well as across increasing levels of challenge with a random regression model using Legendre polynomials. Although heritability estimates of number of pigs born alive were slightly higher in environments with extreme CL than in those with intermediate levels of CL, the heritabilities of number of piglet losses increased progressively as CL increased. Genetic correlations among environments with different levels of CL suggest that selection in environments with extremes of low or high CL would result in low response to selection. Therefore, selection programs of breeding organizations that are commonly conducted under favorable environments could have low response to selection in commercial farms that have unfavorable environmental conditions. Sows that had experienced high levels of challenge at least once during their productive life were ranked according to their EBV. The selection of pigs using EBV ignoring environmental challenges or on the basis of records from only favorable environments resulted in a sharp decline in productivity as the level of challenge increased. In contrast, selection using the random regression approach resulted in limited change in productivity with increasing levels of challenge. Hence, we demonstrate that the use of a quantitative measure of environmental CL and a random regression approach can be comprehensively combined for genetic selection of pigs with enhanced ability to maintain high productivity in harsh environments.

  20. Numerical discretization-based estimation methods for ordinary differential equation models via penalized spline smoothing with applications in biomedical research.

    PubMed

    Wu, Hulin; Xue, Hongqi; Kumar, Arun

    2012-06-01

    Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches. © 2012, The International Biometric Society.

  1. Multidimensional stochastic approximation using locally contractive functions

    NASA Technical Reports Server (NTRS)

    Lawton, W. M.

    1975-01-01

    A Robbins-Monro type multidimensional stochastic approximation algorithm which converges in mean square and with probability one to the fixed point of a locally contractive regression function is developed. The algorithm is applied to obtain maximum likelihood estimates of the parameters for a mixture of multivariate normal distributions.

  2. Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study

    PubMed Central

    Gascuel, Olivier

    2017-01-01

    Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987

  3. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

    PubMed

    Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

    2013-12-11

    The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.

  4. Inverse sampling regression for pooled data.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Eskridge, Kent; Crossa, José

    2017-06-01

    Because pools are tested instead of individuals in group testing, this technique is helpful for estimating prevalence in a population or for classifying a large number of individuals into two groups at a low cost. For this reason, group testing is a well-known means of saving costs and producing precise estimates. In this paper, we developed a mixed-effect group testing regression that is useful when the data-collecting process is performed using inverse sampling. This model allows including covariate information at the individual level to incorporate heterogeneity among individuals and identify which covariates are associated with positive individuals. We present an approach to fit this model using maximum likelihood and we performed a simulation study to evaluate the quality of the estimates. Based on the simulation study, we found that the proposed regression method for inverse sampling with group testing produces parameter estimates with low bias when the pre-specified number of positive pools (r) to stop the sampling process is at least 10 and the number of clusters in the sample is also at least 10. We performed an application with real data and we provide an NLMIXED code that researchers can use to implement this method.

  5. Application of Multi-task Lasso Regression in the Parametrization of Stellar Spectra

    NASA Astrophysics Data System (ADS)

    Chang, Li-Na; Zhang, Pei-Ai

    2015-07-01

    The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only can obtain the information about the common features of the different physical parameters, but also can preserve effectively their own peculiar features. Experiments were done based on the ELODIE synthetic spectral data simulated with the stellar atmospheric model, and on the SDSS data released by the American large-scale survey Sloan. The estimation precision of our model is better than those of the methods in the related literature, especially for the estimates of the gravitational acceleration (lg g) and the chemical abundance ([Fe/H]). In the experiments we changed the spectral resolution, and applied the noises with different signal-to-noise ratios (SNRs) to the spectral data, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, it has a strong stability, and can also improve the overall prediction accuracy of the model.

  6. [Detecting the moisture content of forest surface soil based on the microwave remote sensing technology.

    PubMed

    Li, Ming Ze; Gao, Yuan Ke; Di, Xue Ying; Fan, Wen Yi

    2016-03-01

    The moisture content of forest surface soil is an important parameter in forest ecosystems. It is practically significant for forest ecosystem related research to use microwave remote sensing technology for rapid and accurate estimation of the moisture content of forest surface soil. With the aid of TDR-300 soil moisture content measuring instrument, the moisture contents of forest surface soils of 120 sample plots at Tahe Forestry Bureau of Daxing'anling region in Heilongjiang Province were measured. Taking the moisture content of forest surface soil as the dependent variable and the polarization decomposition parameters of C band Quad-pol SAR data as independent variables, two types of quantitative estimation models (multilinear regression model and BP-neural network model) for predicting moisture content of forest surface soils were developed. The spatial distribution of moisture content of forest surface soil on the regional scale was then derived with model inversion. Results showed that the model precision was 86.0% and 89.4% with RMSE of 3.0% and 2.7% for the multilinear regression model and the BP-neural network model, respectively. It indicated that the BP-neural network model had a better performance than the multilinear regression model in quantitative estimation of the moisture content of forest surface soil. The spatial distribution of forest surface soil moisture content in the study area was then obtained by using the BP neural network model simulation with the Quad-pol SAR data.

  7. Adaptive and Personalized Plasma Insulin Concentration Estimation for Artificial Pancreas Systems.

    PubMed

    Hajizadeh, Iman; Rashid, Mudassir; Samadi, Sediqeh; Feng, Jianyuan; Sevil, Mert; Hobbs, Nicole; Lazaro, Caterina; Maloney, Zacharie; Brandt, Rachel; Yu, Xia; Turksoy, Kamuran; Littlejohn, Elizabeth; Cengiz, Eda; Cinar, Ali

    2018-05-01

    The artificial pancreas (AP) system, a technology that automatically administers exogenous insulin in people with type 1 diabetes mellitus (T1DM) to regulate their blood glucose concentrations, necessitates the estimation of the amount of active insulin already present in the body to avoid overdosing. An adaptive and personalized plasma insulin concentration (PIC) estimator is designed in this work to accurately quantify the insulin present in the bloodstream. The proposed PIC estimation approach incorporates Hovorka's glucose-insulin model with the unscented Kalman filtering algorithm. Methods for the personalized initialization of the time-varying model parameters to individual patients for improved estimator convergence are developed. Data from 20 three-days-long closed-loop clinical experiments conducted involving subjects with T1DM are used to evaluate the proposed PIC estimation approach. The proposed methods are applied to the clinical data containing significant disturbances, such as unannounced meals and exercise, and the results demonstrate the accurate real-time estimation of the PIC with the root mean square error of 7.15 and 9.25 mU/L for the optimization-based fitted parameters and partial least squares regression-based testing parameters, respectively. The accurate real-time estimation of PIC will benefit the AP systems by preventing overdelivery of insulin when significant insulin is present in the bloodstream.

  8. Development of property-transfer models for estimating the hydraulic properties of deep sediments at the Idaho National Engineering and Environmental Laboratory, Idaho

    USGS Publications Warehouse

    Winfield, Kari A.

    2005-01-01

    Because characterizing the unsaturated hydraulic properties of sediments over large areas or depths is costly and time consuming, development of models that predict these properties from more easily measured bulk-physical properties is desirable. At the Idaho National Engineering and Environmental Laboratory, the unsaturated zone is composed of thick basalt flow sequences interbedded with thinner sedimentary layers. Determining the unsaturated hydraulic properties of sedimentary layers is one step in understanding water flow and solute transport processes through this complex unsaturated system. Multiple linear regression was used to construct simple property-transfer models for estimating the water-retention curve and saturated hydraulic conductivity of deep sediments at the Idaho National Engineering and Environmental Laboratory. The regression models were developed from 109 core sample subsets with laboratory measurements of hydraulic and bulk-physical properties. The core samples were collected at depths of 9 to 175 meters at two facilities within the southwestern portion of the Idaho National Engineering and Environmental Laboratory-the Radioactive Waste Management Complex, and the Vadose Zone Research Park southwest of the Idaho Nuclear Technology and Engineering Center. Four regression models were developed using bulk-physical property measurements (bulk density, particle density, and particle size) as the potential explanatory variables. Three representations of the particle-size distribution were compared: (1) textural-class percentages (gravel, sand, silt, and clay), (2) geometric statistics (mean and standard deviation), and (3) graphical statistics (median and uniformity coefficient). The four response variables, estimated from linear combinations of the bulk-physical properties, included saturated hydraulic conductivity and three parameters that define the water-retention curve. For each core sample,values of each water-retention parameter were estimated from the appropriate regression equation and used to calculate an estimated water-retention curve. The degree to which the estimated curve approximated the measured curve was quantified using a goodness-of-fit indicator, the root-mean-square error. Comparison of the root-mean-square-error distributions for each alternative particle-size model showed that the estimated water-retention curves were insensitive to the way the particle-size distribution was represented. Bulk density, the median particle diameter, and the uniformity coefficient were chosen as input parameters for the final models. The property-transfer models developed in this study allow easy determination of hydraulic properties without need for their direct measurement. Additionally, the models provide the basis for development of theoretical models that rely on physical relationships between the pore-size distribution and the bulk-physical properties of the media. With this adaptation, the property-transfer models should have greater application throughout the Idaho National Engineering and Environmental Laboratory and other geographic locations.

  9. A Functional Varying-Coefficient Single-Index Model for Functional Response Data

    PubMed Central

    Li, Jialiang; Huang, Chao; Zhu, Hongtu

    2016-01-01

    Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. PMID:29200540

  10. A Functional Varying-Coefficient Single-Index Model for Functional Response Data.

    PubMed

    Li, Jialiang; Huang, Chao; Zhu, Hongtu

    2017-01-01

    Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) study.

  11. Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

    PubMed

    Wang, Wei; Griswold, Michael E

    2016-11-30

    The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    PubMed Central

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  13. Fast estimation of diffusion tensors under Rician noise by the EM algorithm.

    PubMed

    Liu, Jia; Gasbarra, Dario; Railavo, Juha

    2016-01-15

    Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a fast computational method for maximum likelihood estimation (MLE) of diffusivities under the Rician noise model based on the expectation maximization (EM) algorithm. By using data augmentation, we are able to transform a non-linear regression problem into the generalized linear modeling framework, reducing dramatically the computational cost. The Fisher-scoring method is used for achieving fast convergence of the tensor parameter. The new method is implemented and applied using both synthetic and real data in a wide range of b-amplitudes up to 14,000s/mm(2). Higher accuracy and precision of the Rician estimates are achieved compared with other log-normal based methods. In addition, we extend the maximum likelihood (ML) framework to the maximum a posteriori (MAP) estimation in DTI under the aforementioned scheme by specifying the priors. We will describe how close numerically are the estimators of model parameters obtained through MLE and MAP estimation. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Bayesian LASSO, scale space and decision making in association genetics.

    PubMed

    Pasanen, Leena; Holmström, Lasse; Sillanpää, Mikko J

    2015-01-01

    LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection. We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.

  15. Urinary inorganic arsenic concentrations and semen quality of male partners of subfertile couples in Tokyo.

    PubMed

    Oguri, Tomoko; Yoshinaga, Jun; Toshima, Hiroki; Mizumoto, Yoshifumi; Hatakeyama, Shota; Tokuoka, Susumu

    2016-01-01

    Inorganic arsenic (iAs) has been known as a testicular toxicant in experimental rodents. Possible association between iAs exposure and semen quality (semen volume, sperm concentration, and sperm motility) was explored in male partners of couples (n = 42) who visited a gynecology clinic in Tokyo for infertility consultation. Semen parameters were measured according to WHO guideline at the clinic, and urinary iAs and methylarsonic acid (MMA), and dimethylarsinic acid concentrations were determined by liquid chromatography-hydride generation-ICP mass spectrometry. Biological attributes, dietary habits, and exposure levels to other chemicals with known effects on semen parameters were taken into consideration as covariates. Multiple regression analyses and logistic regression analyses did not find iAs exposure as significant contributor to semen parameters. Lower exposure level of subjects (estimated to be 0.5 μg kg(-1) day(-1)) was considered a reason of the absence of adverse effects on semen parameters, which were seen in rodents dosed with 4-7.5 mg kg(-1).

  16. Noninvasive diagnostics of skin microphysical parameters based on spatially resolved diffuse reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Lisenko, S. A.; Kugeiko, M. M.

    2013-01-01

    The ability to determine noninvasively microphysical parameters (MPPs) of skin characteristic of malignant melanoma was demonstrated. The MPPs were the melanin content in dermis, saturation of tissue with blood vessels, and concentration and effective size of tissue scatterers. The proposed method was based on spatially resolved spectral measurements of skin diffuse reflectance and multiple regressions between linearly independent measurement components and skin MPPs. The regressions were established by modeling radiation transfer in skin with a wide variation of its MPPs. Errors in the determination of skin MPPs were estimated using fiber-optic measurements of its diffuse reflectance at wavelengths of commercially available semiconductor diode lasers (578, 625, 660, 760, and 806 nm) at source-detector separations of 0.23-1.38 mm.

  17. Quantitative analysis of aircraft multispectral-scanner data and mapping of water-quality parameters in the James River in Virginia

    NASA Technical Reports Server (NTRS)

    Johnson, R. W.; Bahn, G. S.

    1977-01-01

    Statistical analysis techniques were applied to develop quantitative relationships between in situ river measurements and the remotely sensed data that were obtained over the James River in Virginia on 28 May 1974. The remotely sensed data were collected with a multispectral scanner and with photographs taken from an aircraft platform. Concentration differences among water quality parameters such as suspended sediment, chlorophyll a, and nutrients indicated significant spectral variations. Calibrated equations from the multiple regression analysis were used to develop maps that indicated the quantitative distributions of water quality parameters and the dispersion characteristics of a pollutant plume entering the turbid river system. Results from further analyses that use only three preselected multispectral scanner bands of data indicated that regression coefficients and standard errors of estimate were not appreciably degraded compared with results from the 10-band analysis.

  18. Scale of association: hierarchical linear models and the measurement of ecological systems

    Treesearch

    Sean M. McMahon; Jeffrey M. Diez

    2007-01-01

    A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...

  19. A provisional effective evaluation when errors are present in independent variables

    NASA Technical Reports Server (NTRS)

    Gurin, L. S.

    1983-01-01

    Algorithms are examined for evaluating the parameters of a regression model when there are errors in the independent variables. The algorithms are fast and the estimates they yield are stable with respect to the correlation of errors and measurements of both the dependent variable and the independent variables.

  20. System identification principles in studies of forest dynamics.

    Treesearch

    Rolfe A. Leary

    1970-01-01

    Shows how it is possible to obtain governing equation parameter estimates on the basis of observed system states. The approach used represents a constructive alternative to regression techniques for models expressed as differential equations. This approach allows scientists to more completely quantify knowledge of forest development processes, to express theories in...

  1. Individual tree growth models for natural even-aged shortleaf pine

    Treesearch

    Chakra B. Budhathoki; Thomas B. Lynch; James M. Guldin

    2006-01-01

    Shortleaf pine (Pinus echinata Mill.) measurements were available from permanent plots established in even-aged stands of the Ouachita Mountains for studying growth. Annual basal area growth was modeled with a least-squares nonlinear regression method utilizing three measurements. The analysis showed that the parameter estimates were in agreement...

  2. Use of random regression to estimate genetic parameters of temperament across an age continuum in a crossbred cattle population.

    PubMed

    Littlejohn, B P; Riley, D G; Welsh, T H; Randel, R D; Willard, S T; Vann, R C

    2018-05-12

    The objective was to estimate genetic parameters of temperament in beef cattle across an age continuum. The population consisted predominantly of Brahman-British crossbred cattle. Temperament was quantified by: 1) pen score (PS), the reaction of a calf to a single experienced evaluator on a scale of 1 to 5 (1 = calm, 5 = excitable); 2) exit velocity (EV), the rate (m/sec) at which a calf traveled 1.83 m upon exiting a squeeze chute; and 3) temperament score (TS), the numerical average of PS and EV. Covariates included days of age and proportion of Bos indicus in the calf and dam. Random regression models included the fixed effects determined from the repeated measures models, except for calf age. Likelihood ratio tests were used to determine the most appropriate random structures. In repeated measures models, the proportion of Bos indicus in the calf was positively related with each calf temperament trait (0.41 ± 0.20, 0.85 ± 0.21, and 0.57 ± 0.18 for PS, EV, and TS, respectively; P < 0.01). There was an effect of contemporary group (combinations of season, year of birth, and management group) and dam age (P < 0.001) in all models. From repeated records analyses, estimates of heritability (h2) were 0.34 ± 0.04, 0.31 ± 0.04, and 0.39 ± 0.04, while estimates of permanent environmental variance as a proportion of the phenotypic variance (c2) were 0.30 ± 0.04, 0.31 ± 0.03, and 0.34 ± 0.04 for PS, EV, and TS, respectively. Quadratic additive genetic random regressions on Legendre polynomials of age were significant for all traits. Quadratic permanent environmental random regressions were significant for PS and TS, but linear permanent environmental random regressions were significant for EV. Random regression results suggested that these components change across the age dimension of these data. There appeared to be an increasing influence of permanent environmental effects and decreasing influence of additive genetic effects corresponding to increasing calf age for EV, and to a lesser extent for TS. Inherited temperament may be overcome by accumulating environmental stimuli with increases in age, especially after weaning.

  3. Regression analysis of case K interval-censored failure time data in the presence of informative censoring.

    PubMed

    Wang, Peijie; Zhao, Hui; Sun, Jianguo

    2016-12-01

    Interval-censored failure time data occur in many fields such as demography, economics, medical research, and reliability and many inference procedures on them have been developed (Sun, 2006; Chen, Sun, and Peace, 2012). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice (Zhang et al., 2007; Ma, Hu, and Sun, 2015). In this article, we consider regression analysis of case K interval-censored failure time data when the censoring mechanism may be related to the failure time of interest. For the problem, an estimated sieve maximum-likelihood approach is proposed for the data arising from the proportional hazards frailty model and for estimation, a two-step procedure is presented. In the addition, the asymptotic properties of the proposed estimators of regression parameters are established and an extensive simulation study suggests that the method works well. Finally, we apply the method to a set of real interval-censored data that motivated this study. © 2016, The International Biometric Society.

  4. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  5. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  6. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  7. [Stature estimation for Sichuan Han nationality female based on X-ray technology with measurement of lumbar vertebrae].

    PubMed

    Qing, Si-han; Chang, Yun-feng; Dong, Xiao-ai; Li, Yuan; Chen, Xiao-gang; Shu, Yong-kang; Deng, Zhen-hua

    2013-10-01

    To establish the mathematical models of stature estimation for Sichuan Han female with measurement of lumbar vertebrae by X-ray to provide essential data for forensic anthropology research. The samples, 206 Sichuan Han females, were divided into three groups including group A, B and C according to the ages. Group A (206 samples) consisted of all ages, group B (116 samples) were 20-45 years old and 90 samples over 45 years old were group C. All the samples were examined lumbar vertebrae through CR technology, including the parameters of five centrums (L1-L5) as anterior border, posterior border and central heights (x1-x15), total central height of lumbar spine (x16), and the real height of every sample. The linear regression analysis was produced using the parameters to establish the mathematical models of stature estimation. Sixty-two trained subjects were tested to verify the accuracy of the mathematical models. The established mathematical models by hypothesis test of linear regression equation model were statistically significant (P<0.05). The standard errors of the equation were 2.982-5.004 cm, while correlation coefficients were 0.370-0.779 and multiple correlation coefficients were 0.533-0.834. The return tests of the highest correlation coefficient and multiple correlation coefficient of each group showed that the highest accuracy of the multiple regression equation, y = 100.33 + 1.489 x3 - 0.548 x6 + 0.772 x9 + 0.058 x12 + 0.645 x15, in group A were 80.6% (+/- lSE) and 100% (+/- 2SE). The established mathematical models in this study could be applied for the stature estimation for Sichuan Han females.

  8. A CNN Regression Approach for Real-Time 2D/3D Registration.

    PubMed

    Shun Miao; Wang, Z Jane; Rui Liao

    2016-05-01

    In this paper, we present a Convolutional Neural Network (CNN) regression approach to address the two major limitations of existing intensity-based 2-D/3-D registration technology: 1) slow computation and 2) small capture range. Different from optimization-based methods, which iteratively optimize the transformation parameters over a scalar-valued metric function representing the quality of the registration, the proposed method exploits the information embedded in the appearances of the digitally reconstructed radiograph and X-ray images, and employs CNN regressors to directly estimate the transformation parameters. An automatic feature extraction step is introduced to calculate 3-D pose-indexed features that are sensitive to the variables to be regressed while robust to other factors. The CNN regressors are then trained for local zones and applied in a hierarchical manner to break down the complex regression task into multiple simpler sub-tasks that can be learned separately. Weight sharing is furthermore employed in the CNN regression model to reduce the memory footprint. The proposed approach has been quantitatively evaluated on 3 potential clinical applications, demonstrating its significant advantage in providing highly accurate real-time 2-D/3-D registration with a significantly enlarged capture range when compared to intensity-based methods.

  9. A Two-Stage Method to Determine Optimal Product Sampling considering Dynamic Potential Market

    PubMed Central

    Hu, Zhineng; Lu, Wei; Han, Bing

    2015-01-01

    This paper develops an optimization model for the diffusion effects of free samples under dynamic changes in potential market based on the characteristics of independent product and presents a two-stage method to figure out the sampling level. The impact analysis of the key factors on the sampling level shows that the increase of the external coefficient or internal coefficient has a negative influence on the sampling level. And the changing rate of the potential market has no significant influence on the sampling level whereas the repeat purchase has a positive one. Using logistic analysis and regression analysis, the global sensitivity analysis gives a whole analysis of the interaction of all parameters, which provides a two-stage method to estimate the impact of the relevant parameters in the case of inaccuracy of the parameters and to be able to construct a 95% confidence interval for the predicted sampling level. Finally, the paper provides the operational steps to improve the accuracy of the parameter estimation and an innovational way to estimate the sampling level. PMID:25821847

  10. Inverse modelling for real-time estimation of radiological consequences in the early stage of an accidental radioactivity release.

    PubMed

    Pecha, Petr; Šmídl, Václav

    2016-11-01

    A stepwise sequential assimilation algorithm is proposed based on an optimisation approach for recursive parameter estimation and tracking of radioactive plume propagation in the early stage of a radiation accident. Predictions of the radiological situation in each time step of the plume propagation are driven by an existing short-term meteorological forecast and the assimilation procedure manipulates the model parameters to match the observations incoming concurrently from the terrain. Mathematically, the task is a typical ill-posed inverse problem of estimating the parameters of the release. The proposed method is designated as a stepwise re-estimation of the source term release dynamics and an improvement of several input model parameters. It results in a more precise determination of the adversely affected areas in the terrain. The nonlinear least-squares regression methodology is applied for estimation of the unknowns. The fast and adequately accurate segmented Gaussian plume model (SGPM) is used in the first stage of direct (forward) modelling. The subsequent inverse procedure infers (re-estimates) the values of important model parameters from the actual observations. Accuracy and sensitivity of the proposed method for real-time forecasting of the accident propagation is studied. First, a twin experiment generating noiseless simulated "artificial" observations is studied to verify the minimisation algorithm. Second, the impact of the measurement noise on the re-estimated source release rate is examined. In addition, the presented method can be used as a proposal for more advanced statistical techniques using, e.g., importance sampling. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. On the Asymptotic Relative Efficiency of Planned Missingness Designs.

    PubMed

    Rhemtulla, Mijke; Savalei, Victoria; Little, Todd D

    2016-03-01

    In planned missingness (PM) designs, certain data are set a priori to be missing. PM designs can increase validity and reduce cost; however, little is known about the loss of efficiency that accompanies these designs. The present paper compares PM designs to reduced sample (RN) designs that have the same total number of data points concentrated in fewer participants. In 4 studies, we consider models for both observed and latent variables, designs that do or do not include an "X set" of variables with complete data, and a full range of between- and within-set correlation values. All results are obtained using asymptotic relative efficiency formulas, and thus no data are generated; this novel approach allows us to examine whether PM designs have theoretical advantages over RN designs removing the impact of sampling error. Our primary findings are that (a) in manifest variable regression models, estimates of regression coefficients have much lower relative efficiency in PM designs as compared to RN designs, (b) relative efficiency of factor correlation or latent regression coefficient estimates is maximized when the indicators of each latent variable come from different sets, and (c) the addition of an X set improves efficiency in manifest variable regression models only for the parameters that directly involve the X-set variables, but it substantially improves efficiency of most parameters in latent variable models. We conclude that PM designs can be beneficial when the model of interest is a latent variable model; recommendations are made for how to optimize such a design.

  12. Estimation of Unsteady Aerodynamic Models from Dynamic Wind Tunnel Data

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick; Klein, Vladislav

    2011-01-01

    Demanding aerodynamic modelling requirements for military and civilian aircraft have motivated researchers to improve computational and experimental techniques and to pursue closer collaboration in these areas. Model identification and validation techniques are key components for this research. This paper presents mathematical model structures and identification techniques that have been used successfully to model more general aerodynamic behaviours in single-degree-of-freedom dynamic testing. Model parameters, characterizing aerodynamic properties, are estimated using linear and nonlinear regression methods in both time and frequency domains. Steps in identification including model structure determination, parameter estimation, and model validation, are addressed in this paper with examples using data from one-degree-of-freedom dynamic wind tunnel and water tunnel experiments. These techniques offer a methodology for expanding the utility of computational methods in application to flight dynamics, stability, and control problems. Since flight test is not always an option for early model validation, time history comparisons are commonly made between computational and experimental results and model adequacy is inferred by corroborating results. An extension is offered to this conventional approach where more general model parameter estimates and their standard errors are compared.

  13. The area-time-integral technique to estimate convective rain volumes over areas applied to satellite data - A preliminary investigation

    NASA Technical Reports Server (NTRS)

    Doneaud, Andre A.; Miller, James R., Jr.; Johnson, L. Ronald; Vonder Haar, Thomas H.; Laybe, Patrick

    1987-01-01

    The use of the area-time-integral (ATI) technique, based only on satellite data, to estimate convective rain volume over a moving target is examined. The technique is based on the correlation between the radar echo area coverage integrated over the lifetime of the storm and the radar estimated rain volume. The processing of the GOES and radar data collected in 1981 is described. The radar and satellite parameters for six convective clusters from storm events occurring on June 12 and July 2, 1981 are analyzed and compared in terms of time steps and cluster lifetimes. Rain volume is calculated by first using the regression analysis to generate the regression equation used to obtain the ATI; the ATI versus rain volume relation is then employed to compute rain volume. The data reveal that the ATI technique using satellite data is applicable to the calculation of rain volume.

  14. Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.

    PubMed

    Lee, Soojeong; Chang, Joon-Hyuk

    2017-11-01

    This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57  mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. A general approach to double-moment normalization of drop size distributions

    NASA Astrophysics Data System (ADS)

    Lee, G. W.; Sempere-Torres, D.; Uijlenhoet, R.; Zawadzki, I.

    2003-04-01

    Normalization of drop size distributions (DSDs) is re-examined here. First, we present an extension of scaling normalization using one moment of the DSD as a parameter (as introduced by Sempere-Torres et al, 1994) to a scaling normalization using two moments as parameters of the normalization. It is shown that the normalization of Testud et al. (2001) is a particular case of the two-moment scaling normalization. Thus, a unified vision of the question of DSDs normalization and a good model representation of DSDs is given. Data analysis shows that from the point of view of moment estimation least square regression is slightly more effective than moment estimation from the normalized average DSD.

  16. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  17. Characterizing the performance of the Conway-Maxwell Poisson generalized linear model.

    PubMed

    Francis, Royce A; Geedipally, Srinivas Reddy; Guikema, Seth D; Dhavala, Soma Sekhar; Lord, Dominique; LaRocca, Sarah

    2012-01-01

    Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression. © 2011 Society for Risk Analysis.

  18. Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters

    PubMed Central

    2014-01-01

    This paper examined the efficiency of multivariate linear regression (MLR) and artificial neural network (ANN) models in prediction of two major water quality parameters in a wastewater treatment plant. Biochemical oxygen demand (BOD) and chemical oxygen demand (COD) as well as indirect indicators of organic matters are representative parameters for sewer water quality. Performance of the ANN models was evaluated using coefficient of correlation (r), root mean square error (RMSE) and bias values. The computed values of BOD and COD by model, ANN method and regression analysis were in close agreement with their respective measured values. Results showed that the ANN performance model was better than the MLR model. Comparative indices of the optimized ANN with input values of temperature (T), pH, total suspended solid (TSS) and total suspended (TS) for prediction of BOD was RMSE = 25.1 mg/L, r = 0.83 and for prediction of COD was RMSE = 49.4 mg/L, r = 0.81. It was found that the ANN model could be employed successfully in estimating the BOD and COD in the inlet of wastewater biochemical treatment plants. Moreover, sensitive examination results showed that pH parameter have more effect on BOD and COD predicting to another parameters. Also, both implemented models have predicted BOD better than COD. PMID:24456676

  19. Techniques for estimating magnitude and frequency of floods in Minnesota

    USGS Publications Warehouse

    Guetzkow, Lowell C.

    1977-01-01

     Estimating relations have been developed to provide engineers and designers with improved techniques for defining flow-frequency characteristics to satisfy hydraulic planning and design requirements. The magnitude and frequency of floods up to the 100-year recurrence interval can be determined for most streams in Minnesota by methods presented. By multiple regression analysis, equations have been developed for estimating flood-frequency relations at ungaged sites on natural flow streams. Eight distinct hydrologic regions are delineated within the State with boundaries defined generally by river basin divides. Regression equations are provided for each region which relate selected frequency floods to significant basin parameters. For main-stem streams, graphs are presented showing floods for selected recurrence intervals plotted against contributing drainage area. Flow-frequency estimates for intervening sites along the Minnesota River, Mississippi River, and the Red River of the North can be derived from these graphs. Flood-frequency characteristics are tabulated for 201 paging stations having 10 or more years of record.

  20. Methods for estimating drought streamflow probabilities for Virginia streams

    USGS Publications Warehouse

    Austin, Samuel H.

    2014-01-01

    Maximum likelihood logistic regression model equations used to estimate drought flow probabilities for Virginia streams are presented for 259 hydrologic basins in Virginia. Winter streamflows were used to estimate the likelihood of streamflows during the subsequent drought-prone summer months. The maximum likelihood logistic regression models identify probable streamflows from 5 to 8 months in advance. More than 5 million streamflow daily values collected over the period of record (January 1, 1900 through May 16, 2012) were compiled and analyzed over a minimum 10-year (maximum 112-year) period of record. The analysis yielded the 46,704 equations with statistically significant fit statistics and parameter ranges published in two tables in this report. These model equations produce summer month (July, August, and September) drought flow threshold probabilities as a function of streamflows during the previous winter months (November, December, January, and February). Example calculations are provided, demonstrating how to use the equations to estimate probable streamflows as much as 8 months in advance.

  1. An indirect approach to assess the pests on sorghum by remote sensing

    NASA Astrophysics Data System (ADS)

    Singh, D.; Sao, R.

    In today's world of advanced technology various techniques are being used to study ecological parameter and gathering data for agricultural benefits. The major aspects of remote sensing are timely estimates of agriculture crop yield, prediction of pest etc. The damage caused by the pest to crop is well known. Therefore, in this paper, an attempt has to be made to estimate the number of pests on sorghum by remote sensing technique. The studies were made on crop Sorghum (Meethi Sudan) that is a forage variety and the pest observed is a species of grasshopper. The beds of crop sorghum were specially prepared for pests as well as microwave scattering measurements. In first phase of study, dependence of number of pests on sorghum plant parameters (i.e., crop covered moist soil (SM), plant height (PH), leaf area index (LAI), percentage Biomass (BIO), Total chlorophyll (TC)) have been observed by the regression analyses and it was found that pests were more dependent on sorghum chlorophyll than other plant parameters, while climatic conditions were taken as constant. A linear relationship has been obtained between number of pests and TC with quite significant values of coefficient of determination (r^2=0.86). These crop parameters are easily assessable through microwave remote sensing so they can form the basis for prediction of pest remotely. In second phase of study, several observations were carried out for various growth stages of sorghum using bistatic scatterometer for both like polarizations (i.e., HH- and VV-) and different incidence angles at X-band (9.5 GHz). Linear, and multiple regression analysis were carried out to check dependence of scattering coefficient on these crop parameters and it was noticed that scattering coefficient was more dependent on sorghum TC than other plant parameters at X-band. A negative correlation has been obtained between TC and scattering coefficient with quite good values of r^2 (0.82). VV-pol gives better results than HH-pol and incidence angle should be more than 40 degree for both like pols for assessing the sorghum TC at X-band. The TC assessed by the microwave measurements was helpful to estimate the number of pests on sorghum. Combining both phase of study, number of pests was estimated and a quite good agreement (r^2=0.76) was found between observed and estimated pests.

  2. Parameter estimation for groundwater models under uncertain irrigation data

    USGS Publications Warehouse

    Demissie, Yonas; Valocchi, Albert J.; Cai, Ximing; Brozovic, Nicholas; Senay, Gabriel; Gebremichael, Mekonnen

    2015-01-01

    The success of modeling groundwater is strongly influenced by the accuracy of the model parameters that are used to characterize the subsurface system. However, the presence of uncertainty and possibly bias in groundwater model source/sink terms may lead to biased estimates of model parameters and model predictions when the standard regression-based inverse modeling techniques are used. This study first quantifies the levels of bias in groundwater model parameters and predictions due to the presence of errors in irrigation data. Then, a new inverse modeling technique called input uncertainty weighted least-squares (IUWLS) is presented for unbiased estimation of the parameters when pumping and other source/sink data are uncertain. The approach uses the concept of generalized least-squares method with the weight of the objective function depending on the level of pumping uncertainty and iteratively adjusted during the parameter optimization process. We have conducted both analytical and numerical experiments, using irrigation pumping data from the Republican River Basin in Nebraska, to evaluate the performance of ordinary least-squares (OLS) and IUWLS calibration methods under different levels of uncertainty of irrigation data and calibration conditions. The result from the OLS method shows the presence of statistically significant (p < 0.05) bias in estimated parameters and model predictions that persist despite calibrating the models to different calibration data and sample sizes. However, by directly accounting for the irrigation pumping uncertainties during the calibration procedures, the proposed IUWLS is able to minimize the bias effectively without adding significant computational burden to the calibration processes.

  3. Quantum Chemically Estimated Abraham Solute Parameters Using Multiple Solvent-Water Partition Coefficients and Molecular Polarizability.

    PubMed

    Liang, Yuzhen; Xiong, Ruichang; Sandler, Stanley I; Di Toro, Dominic M

    2017-09-05

    Polyparameter Linear Free Energy Relationships (pp-LFERs), also called Linear Solvation Energy Relationships (LSERs), are used to predict many environmentally significant properties of chemicals. A method is presented for computing the necessary chemical parameters, the Abraham parameters (AP), used by many pp-LFERs. It employs quantum chemical calculations and uses only the chemical's molecular structure. The method computes the Abraham E parameter using density functional theory computed molecular polarizability and the Clausius-Mossotti equation relating the index refraction to the molecular polarizability, estimates the Abraham V as the COSMO calculated molecular volume, and computes the remaining AP S, A, and B jointly with a multiple linear regression using sixty-five solvent-water partition coefficients computed using the quantum mechanical COSMO-SAC solvation model. These solute parameters, referred to as Quantum Chemically estimated Abraham Parameters (QCAP), are further adjusted by fitting to experimentally based APs using QCAP parameters as the independent variables so that they are compatible with existing Abraham pp-LFERs. QCAP and adjusted QCAP for 1827 neutral chemicals are included. For 24 solvent-water systems including octanol-water, predicted log solvent-water partition coefficients using adjusted QCAP have the smallest root-mean-square errors (RMSEs, 0.314-0.602) compared to predictions made using APs estimated using the molecular fragment based method ABSOLV (0.45-0.716). For munition and munition-like compounds, adjusted QCAP has much lower RMSE (0.860) than does ABSOLV (4.45) which essentially fails for these compounds.

  4. A Development of Nonstationary Regional Frequency Analysis Model with Large-scale Climate Information: Its Application to Korean Watershed

    NASA Astrophysics Data System (ADS)

    Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo

    2015-04-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  5. Modeling the human development index and the percentage of poor people using quantile smoothing splines

    NASA Astrophysics Data System (ADS)

    Mulyani, Sri; Andriyana, Yudhie; Sudartianto

    2017-03-01

    Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.

  6. Robust estimation of thermodynamic parameters (ΔH, ΔS and ΔCp) for prediction of retention time in gas chromatography - Part I (Theoretical).

    PubMed

    Claumann, Carlos Alberto; Wüst Zibetti, André; Bolzan, Ariovaldo; Machado, Ricardo A F; Pinto, Leonel Teixeira

    2015-12-18

    An approach that is commonly used for calculating the retention time of a compound in GC departs from the thermodynamic properties ΔH, ΔS and ΔCp of phase change (from mobile to stationary). Such properties can be estimated by using experimental retention time data, which results in a non-linear regression problem for non-isothermal temperature programs. As shown in this work, the surface of the objective function (approximation error criterion) on the basis of thermodynamic parameters can be divided into three clearly defined regions, and solely in one of them there is a possibility for the global optimum to be found. The main contribution of this study was the development of an algorithm that distinguishes the different regions of the error surface and its use in the robust initialization of the estimation of parameters ΔH, ΔS and ΔCp. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials

    PubMed Central

    Hu, L.; Zhang, Z.G.; Mouraux, A.; Iannetti, G.D.

    2015-01-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical oscillations, obtaining single-trial estimate of response latency, frequency, and magnitude. This permits within-subject statistical comparisons, correlation with pre-stimulus features, and integration of simultaneously-recorded EEG and fMRI. PMID:25665966

  8. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  9. Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

    NASA Astrophysics Data System (ADS)

    Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Leclère, Vincent; Salmon, Joseph

    2017-10-01

    In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider ℓ 1 penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for uncertainty quantification. In this work, after illustrating numerical difficulties for the Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expensive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.

  10. Nonlinear estimation of parameters in biphasic Arrhenius plots.

    PubMed

    Puterman, M L; Hrboticky, N; Innis, S M

    1988-05-01

    This paper presents a formal procedure for the statistical analysis of data on the thermotropic behavior of membrane-bound enzymes generated using the Arrhenius equation and compares the analysis to several alternatives. Data is modeled by a bent hyperbola. Nonlinear regression is used to obtain estimates and standard errors of the intersection of line segments, defined as the transition temperature, and slopes, defined as energies of activation of the enzyme reaction. The methodology allows formal tests of the adequacy of a biphasic model rather than either a single straight line or a curvilinear model. Examples on data concerning the thermotropic behavior of pig brain synaptosomal acetylcholinesterase are given. The data support the biphasic temperature dependence of this enzyme. The methodology represents a formal procedure for statistical validation of any biphasic data and allows for calculation of all line parameters with estimates of precision.

  11. Mapping and imputing potential productivity of Pacific Northwest forests using climate variables

    Treesearch

    Gregory Latta; Hailemariam Temesgen; Tara Barrett

    2009-01-01

    Regional estimation of potential forest productivity is important to diverse applications, including biofuels supply, carbon sequestration, and projections of forest growth. Using PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate and productivity data measured on a grid of 3356 Forest Inventory and Analysis plots in Oregon and Washington, we...

  12. Development and comparison of multiple regression models to predict bankfull channel dimensions for use in hydrologic models

    USDA-ARS?s Scientific Manuscript database

    Bankfull hydraulic geometry relationships are used to estimate channel dimensions for streamflow simulation models, which require channel geometry data as input parameters. Often, one nationwide curve is used across the entire United States (U.S.) (e.g., in Soil and Water Assessment Tool), even tho...

  13. A Process Dynamics and Control Experiment for the Undergraduate Laboratory

    ERIC Educational Resources Information Center

    Spencer, Jordan L.

    2009-01-01

    This paper describes a process control experiment. The apparatus includes a three-vessel glass flow system with a variable flow configuration, means for feeding dye solution controlled by a stepper-motor driven valve, and a flow spectrophotometer. Students use impulse response data and nonlinear regression to estimate three parameters of a model…

  14. SigrafW: An easy-to-use program for fitting enzyme kinetic data.

    PubMed

    Leone, Francisco Assis; Baranauskas, José Augusto; Furriel, Rosa Prazeres Melo; Borin, Ivana Aparecida

    2005-11-01

    SigrafW is Windows-compatible software developed using the Microsoft® Visual Basic Studio program that uses the simplified Hill equation for fitting kinetic data from allosteric and Michaelian enzymes. SigrafW uses a modified Fibonacci search to calculate maximal velocity (V), the Hill coefficient (n), and the enzyme-substrate apparent dissociation constant (K). The estimation of V, K, and the sum of the squares of residuals is performed using a Wilkinson nonlinear regression at any Hill coefficient (n). In contrast to many currently available kinetic analysis programs, SigrafW shows several advantages for the determination of kinetic parameters of both hyperbolic and nonhyperbolic saturation curves. No initial estimates of the kinetic parameters are required, a measure of the goodness-of-the-fit for each calculation performed is provided, the nonlinear regression used for calculations eliminates the statistical bias inherent in linear transformations, and the software can be used for enzyme kinetic simulations either for educational or research purposes. Persons interested in receiving a free copy of the software should contact Dr. F. A. Leone. Copyright © 2005 International Union of Biochemistry and Molecular Biology, Inc.

  15. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  16. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-05-01

    MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.

  17. Data error and highly parameterized groundwater models

    USGS Publications Warehouse

    Hill, M.C.

    2008-01-01

    Strengths and weaknesses of highly parameterized models, in which the number of parameters exceeds the number of observations, are demonstrated using a synthetic test case. Results suggest that the approach can yield close matches to observations but also serious errors in system representation. It is proposed that avoiding the difficulties of highly parameterized models requires close evaluation of: (1) model fit, (2) performance of the regression, and (3) estimated parameter distributions. Comparisons to hydrogeologic information are expected to be critical to obtaining credible models. Copyright ?? 2008 IAHS Press.

  18. Temperature profile retrievals with extended Kalman-Bucy filters

    NASA Technical Reports Server (NTRS)

    Ledsham, W. H.; Staelin, D. H.

    1979-01-01

    The Extended Kalman-Bucy Filter is a powerful technique for estimating non-stationary random parameters in situations where the received signal is a noisy non-linear function of those parameters. A practical causal filter for retrieving atmospheric temperature profiles from radiances observed at a single scan angle by the Scanning Microwave Spectrometer (SCAMS) carried on the Nimbus 6 satellite typically shows approximately a 10-30% reduction in rms error about the mean at almost all levels below 70 mb when compared with a regression inversion.

  19. Robust functional regression model for marginal mean and subject-specific inferences.

    PubMed

    Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo

    2017-01-01

    We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.

  20. Spatial structure, sampling design and scale in remotely-sensed imagery of a California savanna woodland

    NASA Technical Reports Server (NTRS)

    Mcgwire, K.; Friedl, M.; Estes, J. E.

    1993-01-01

    This article describes research related to sampling techniques for establishing linear relations between land surface parameters and remotely-sensed data. Predictive relations are estimated between percentage tree cover in a savanna environment and a normalized difference vegetation index (NDVI) derived from the Thematic Mapper sensor. Spatial autocorrelation in original measurements and regression residuals is examined using semi-variogram analysis at several spatial resolutions. Sampling schemes are then tested to examine the effects of autocorrelation on predictive linear models in cases of small sample sizes. Regression models between image and ground data are affected by the spatial resolution of analysis. Reducing the influence of spatial autocorrelation by enforcing minimum distances between samples may also improve empirical models which relate ground parameters to satellite data.

  1. Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models.

    PubMed

    Garcia, Tanya P; Ma, Yanyuan

    2017-10-01

    We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root- n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.

  2. A semisupervised support vector regression method to estimate biophysical parameters from remotely sensed images

    NASA Astrophysics Data System (ADS)

    Castelletti, Davide; Demir, Begüm; Bruzzone, Lorenzo

    2014-10-01

    This paper presents a novel semisupervised learning (SSL) technique defined in the context of ɛ-insensitive support vector regression (SVR) to estimate biophysical parameters from remotely sensed images. The proposed SSL method aims to mitigate the problems of small-sized biased training sets without collecting any additional samples with reference measures. This is achieved on the basis of two consecutive steps. The first step is devoted to inject additional priors information in the learning phase of the SVR in order to adapt the importance of each training sample according to distribution of the unlabeled samples. To this end, a weight is initially associated to each training sample based on a novel strategy that defines higher weights for the samples located in the high density regions of the feature space while giving reduced weights to those that fall into the low density regions of the feature space. Then, in order to exploit different weights for training samples in the learning phase of the SVR, we introduce a weighted SVR (WSVR) algorithm. The second step is devoted to jointly exploit labeled and informative unlabeled samples for further improving the definition of the WSVR learning function. To this end, the most informative unlabeled samples that have an expected accurate target values are initially selected according to a novel strategy that relies on the distribution of the unlabeled samples in the feature space and on the WSVR function estimated at the first step. Then, we introduce a restructured WSVR algorithm that jointly uses labeled and unlabeled samples in the learning phase of the WSVR algorithm and tunes their importance by different values of regularization parameters. Experimental results obtained for the estimation of single-tree stem volume show the effectiveness of the proposed SSL method.

  3. Genetic analysis of fat-to-protein ratio, milk yield and somatic cell score of Holstein cows in Japan in the first three lactations by using a random regression model.

    PubMed

    Nishiura, Akiko; Sasaki, Osamu; Aihara, Mitsuo; Takeda, Hisato; Satoh, Masahiro

    2015-12-01

    We estimated the genetic parameters of fat-to-protein ratio (FPR) and the genetic correlations between FPR and milk yield or somatic cell score in the first three lactations in dairy cows. Data included 3,079,517 test-day records of 201,138 Holstein cows in Japan from 2006 to 2011. Genetic parameters were estimated with a multiple-trait random regression model in which the records within and between parities were treated as separate traits. The phenotypic values of FPR increased soon after parturition and peaked at 10 to 20 days in milk, then decreased slowly in mid- and late lactation. Heritability estimates for FPR yielded moderate values. Genetic correlations of FPR among parities were low in early lactation. Genetic correlations between FPR and milk yield were positive and low in early lactation, but only in the first lactation. Genetic correlations between FPR and somatic cell score were positive in early lactation and decreased to become negative in mid- to late lactation. By using these results for genetic evaluation it should be possible to improve energy balance in dairy cows. © 2015 Japanese Society of Animal Science.

  4. Optimal Estimation of Clock Values and Trends from Finite Data

    NASA Technical Reports Server (NTRS)

    Greenhall, Charles

    2005-01-01

    We show how to solve two problems of optimal linear estimation from a finite set of phase data. Clock noise is modeled as a stochastic process with stationary dth increments. The covariance properties of such a process are contained in the generalized autocovariance function (GACV). We set up two principles for optimal estimation: with the help of the GACV, these principles lead to a set of linear equations for the regression coefficients and some auxiliary parameters. The mean square errors of the estimators are easily calculated. The method can be used to check the results of other methods and to find good suboptimal estimators based on a small subset of the available data.

  5. [Exploring novel hyperspectral band and key index for leaf nitrogen accumulation in wheat].

    PubMed

    Yao, Xia; Zhu, Yan; Feng, Wei; Tian, Yong-Chao; Cao, Wei-Xing

    2009-08-01

    The objectives of the present study were to explore new sensitive spectral bands and ratio spectral indices based on precise analysis of ground-based hyperspectral information, and then develop regression model for estimating leaf N accumulation per unit soil area (LNA) in winter wheat (Triticum aestivum L.). Three field experiments were conducted with different N rates and cultivar types in three consecutive growing seasons, and time-course measurements were taken on canopy hyperspectral reflectance and LNA tinder the various treatments. By adopting the method of reduced precise sampling, the detailed ratio spectral indices (RSI) within the range of 350-2 500 nm were constructed, and the quantitative relationships between LNA (gN m(-2)) and RSI (i, j) were analyzed. It was found that several key spectral bands and spectral indices were suitable for estimating LNA in wheat, and the spectral parameter RSI (990, 720) was the most reliable indicator for LNA in wheat. The regression model based on the best RSI was formulated as y = 5.095x - 6.040, with R2 of 0.814. From testing of the derived equations with independent experiment data, the model on RSI (990, 720) had R2 of 0.847 and RRMSE of 24.7%. Thus, it is concluded that the present hyperspectral parameter of RSI (990, 720) and derived regression model can be reliably used for estimating LNA in winter wheat. These results provide the feasible key bands and technical basis for developing the portable instrument of monitoring wheat nitrogen status and for extracting useful spectral information from remote sensing images.

  6. Beef quality parameters estimation using ultrasound and color images

    PubMed Central

    2015-01-01

    Background Beef quality measurement is a complex task with high economic impact. There is high interest in obtaining an automatic quality parameters estimation in live cattle or post mortem. In this paper we set out to obtain beef quality estimates from the analysis of ultrasound (in vivo) and color images (post mortem), with the measurement of various parameters related to tenderness and amount of meat: rib eye area, percentage of intramuscular fat and backfat thickness or subcutaneous fat. Proposal An algorithm based on curve evolution is implemented to calculate the rib eye area. The backfat thickness is estimated from the profile of distances between two curves that limit the steak and the rib eye, previously detected. A model base in Support Vector Regression (SVR) is trained to estimate the intramuscular fat percentage. A series of features extracted on a region of interest, previously detected in both ultrasound and color images, were proposed. In all cases, a complete evaluation was performed with different databases including: color and ultrasound images acquired by a beef industry expert, intramuscular fat estimation obtained by an expert using a commercial software, and chemical analysis. Conclusions The proposed algorithms show good results to calculate the rib eye area and the backfat thickness measure and profile. They are also promising in predicting the percentage of intramuscular fat. PMID:25734452

  7. Integration of measurements with atmospheric dispersion models: Source term estimation for dispersal of (239)Pu due to non-nuclear detonation of high explosive

    NASA Astrophysics Data System (ADS)

    Edwards, L. L.; Harvey, T. F.; Freis, R. P.; Pitovranov, S. E.; Chernokozhin, E. V.

    1992-10-01

    The accuracy associated with assessing the environmental consequences of an accidental release of radioactivity is highly dependent on our knowledge of the source term characteristics and, in the case when the radioactivity is condensed on particles, the particle size distribution, all of which are generally poorly known. This paper reports on the development of a numerical technique that integrates the radiological measurements with atmospheric dispersion modeling. This results in a more accurate particle-size distribution and particle injection height estimation when compared with measurements of high explosive dispersal of (239)Pu. The estimation model is based on a non-linear least squares regression scheme coupled with the ARAC three-dimensional atmospheric dispersion models. The viability of the approach is evaluated by estimation of ADPIC model input parameters such as the ADPIC particle size mean aerodynamic diameter, the geometric standard deviation, and largest size. Additionally we estimate an optimal 'coupling coefficient' between the particles and an explosive cloud rise model. The experimental data are taken from the Clean Slate 1 field experiment conducted during 1963 at the Tonopah Test Range in Nevada. The regression technique optimizes the agreement between the measured and model predicted concentrations of (239)Pu by varying the model input parameters within their respective ranges of uncertainties. The technique generally estimated the measured concentrations within a factor of 1.5, with the worst estimate being within a factor of 5, very good in view of the complexity of the concentration measurements, the uncertainties associated with the meteorological data, and the limitations of the models. The best fit also suggest a smaller mean diameter and a smaller geometric standard deviation on the particle size as well as a slightly weaker particle to cloud coupling than previously reported.

  8. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  9. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  10. Quantum algorithm for linear regression

    NASA Astrophysics Data System (ADS)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  11. A revised load estimation procedure for the Susquehanna, Potomac, Patuxent, and Choptank rivers

    USGS Publications Warehouse

    Yochum, Steven E.

    2000-01-01

    The U.S. Geological Survey?s Chesapeake Bay River Input Program has updated the nutrient and suspended-sediment load data base for the Susquehanna, Potomac, Patuxent, and Choptank Rivers using a multiple-window, center-estimate regression methodology. The revised method optimizes the seven-parameter regression approach that has been used historically by the program. The revised method estimates load using the fifth or center year of a sliding 9-year window. Each year a new model is run for each site and constituent, the most recent year is added, and the previous 4 years of estimates are updated. The fifth year in the 9-year window is considered the best estimate and is kept in the data base. The last year of estimation shows the most change from the previous year?s estimate and this change approaches a minimum at the fifth year. Differences between loads computed using this revised methodology and the loads populating the historical data base have been noted but the load estimates do not typically change drastically. The data base resulting from the application of this revised methodology is populated by annual and monthly load estimates that are known with greater certainty than in the previous load data base.

  12. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.

    PubMed

    Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A

    2015-01-15

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.

  13. Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets

    PubMed Central

    Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.

    2014-01-01

    Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152

  14. Enhancing the estimation of blood pressure using pulse arrival time and two confounding factors.

    PubMed

    Baek, Hyun Jae; Kim, Ko Keun; Kim, Jung Soo; Lee, Boreom; Park, Kwang Suk

    2010-02-01

    A new method of blood pressure (BP) estimation using multiple regression with pulse arrival time (PAT) and two confounding factors was evaluated in clinical and unconstrained monitoring situations. For the first analysis with clinical data, electrocardiogram (ECG), photoplethysmogram (PPG) and invasive BP signals were obtained by a conventional patient monitoring device during surgery. In the second analysis, ECG, PPG and non-invasive BP were measured using systems developed to obtain data under conditions in which the subject was not constrained. To enhance the performance of BP estimation methods, heart rate (HR) and arterial stiffness were considered as confounding factors in regression analysis. The PAT and HR were easily extracted from ECG and PPG signals. For arterial stiffness, the duration from the maximum derivative point to the maximum of the dicrotic notch in the PPG signal, a parameter called TDB, was employed. In two experiments that normally cause BP variation, the correlation between measured BP and the estimated BP was investigated. Multiple-regression analysis with the two confounding factors improved correlation coefficients for diastolic blood pressure and systolic blood pressure to acceptable confidence levels, compared to existing methods that consider PAT only. In addition, reproducibility for the proposed method was determined using constructed test sets. Our results demonstrate that non-invasive, non-intrusive BP estimation can be obtained using methods that can be applied in both clinical and daily healthcare situations.

  15. Regression analysis of mixed recurrent-event and panel-count data

    PubMed Central

    Zhu, Liang; Tong, Xinwei; Sun, Jianguo; Chen, Manhua; Srivastava, Deo Kumar; Leisenring, Wendy; Robison, Leslie L.

    2014-01-01

    In event history studies concerning recurrent events, two types of data have been extensively discussed. One is recurrent-event data (Cook and Lawless, 2007. The Analysis of Recurrent Event Data. New York: Springer), and the other is panel-count data (Zhao and others, 2010. Nonparametric inference based on panel-count data. Test 20, 1–42). In the former case, all study subjects are monitored continuously; thus, complete information is available for the underlying recurrent-event processes of interest. In the latter case, study subjects are monitored periodically; thus, only incomplete information is available for the processes of interest. In reality, however, a third type of data could occur in which some study subjects are monitored continuously, but others are monitored periodically. When this occurs, we have mixed recurrent-event and panel-count data. This paper discusses regression analysis of such mixed data and presents two estimation procedures for the problem. One is a maximum likelihood estimation procedure, and the other is an estimating equation procedure. The asymptotic properties of both resulting estimators of regression parameters are established. Also, the methods are applied to a set of mixed recurrent-event and panel-count data that arose from a Childhood Cancer Survivor Study and motivated this investigation. PMID:24648408

  16. Classification of Large-Scale Remote Sensing Images for Automatic Identification of Health Hazards: Smoke Detection Using an Autologistic Regression Classifier.

    PubMed

    Wolters, Mark A; Dean, C B

    2017-01-01

    Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.

  17. Cost efficiency of US hospitals: a stochastic frontier approach.

    PubMed

    Rosko, M D

    2001-09-01

    This study examined the impact of managed care and other environmental factors on hospital inefficiency in 1631 US hospitals during the period 1990-1996. A panel, stochastic frontier regression model was used to estimate inefficiency parameters and inefficiency scores. The results suggest that mean estimated inefficiency decreased by about 28% during the study period. Inefficiency was negatively associated with health maintenance organization (HMO) penetration and industry concentration. It was positively related with Medicare share and for-profit ownership status. Copyright 2001 John Wiley & Sons, Ltd.

  18. Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter.

    PubMed

    Lord, Dominique

    2006-07-01

    There has been considerable research conducted on the development of statistical models for predicting crashes on highway facilities. Despite numerous advancements made for improving the estimation tools of statistical models, the most common probabilistic structure used for modeling motor vehicle crashes remains the traditional Poisson and Poisson-gamma (or Negative Binomial) distribution; when crash data exhibit over-dispersion, the Poisson-gamma model is usually the model of choice most favored by transportation safety modelers. Crash data collected for safety studies often have the unusual attributes of being characterized by low sample mean values. Studies have shown that the goodness-of-fit of statistical models produced from such datasets can be significantly affected. This issue has been defined as the "low mean problem" (LMP). Despite recent developments on methods to circumvent the LMP and test the goodness-of-fit of models developed using such datasets, no work has so far examined how the LMP affects the fixed dispersion parameter of Poisson-gamma models used for modeling motor vehicle crashes. The dispersion parameter plays an important role in many types of safety studies and should, therefore, be reliably estimated. The primary objective of this research project was to verify whether the LMP affects the estimation of the dispersion parameter and, if it is, to determine the magnitude of the problem. The secondary objective consisted of determining the effects of an unreliably estimated dispersion parameter on common analyses performed in highway safety studies. To accomplish the objectives of the study, a series of Poisson-gamma distributions were simulated using different values describing the mean, the dispersion parameter, and the sample size. Three estimators commonly used by transportation safety modelers for estimating the dispersion parameter of Poisson-gamma models were evaluated: the method of moments, the weighted regression, and the maximum likelihood method. In an attempt to complement the outcome of the simulation study, Poisson-gamma models were fitted to crash data collected in Toronto, Ont. characterized by a low sample mean and small sample size. The study shows that a low sample mean combined with a small sample size can seriously affect the estimation of the dispersion parameter, no matter which estimator is used within the estimation process. The probability the dispersion parameter becomes unreliably estimated increases significantly as the sample mean and sample size decrease. Consequently, the results show that an unreliably estimated dispersion parameter can significantly undermine empirical Bayes (EB) estimates as well as the estimation of confidence intervals for the gamma mean and predicted response. The paper ends with recommendations about minimizing the likelihood of producing Poisson-gamma models with an unreliable dispersion parameter for modeling motor vehicle crashes.

  19. Potential of the Sentinel-2 Red Edge Spectral Bands for Estimation of Eco-Physiological Plant Parameters

    NASA Astrophysics Data System (ADS)

    Malenovsky, Zbynek; Homolova, Lucie; Janoutova, Ruzena; Landier, Lucas; Gastellu-Etchegorry, Jean-Philippe; Berthelot, Beatrice; Huck, Alexis

    2016-08-01

    In this study we investigated importance of the space- borne instrument Sentinel-2 red edge spectral bands and reconstructed red edge position (REP) for retrieval of the three eco-physiological plant parameters, leaf and canopy chlorophyll content and leaf area index (LAI), in case of maize agricultural fields and beech and spruce forest stands. Sentinel-2 spectral bands and REP of the investigated vegetation canopies were simulated in the Discrete Anisotropic Radiative Transfer (DART) model. Their potential for estimation of the plant parameters was assessed through training support vector regressions (SVR) and examining their P-vector matrices indicating significance of each input. The trained SVR were then applied on Sentinel-2 simulated images and the acquired estimates were cross-compared with results from high spatial resolution airborne retrievals. Results showed that contribution of REP was significant for canopy chlorophyll content, but less significant for leaf chlorophyll content and insignificant for leaf area index estimations. However, the red edge spectral bands contributed strongly to the retrievals of all parameters, especially canopy and leaf chlorophyll content. Application of SVR on Sentinel-2 simulated images demonstrated, in general, an overestimation of leaf chlorophyll content and an underestimation of LAI when compared to the reciprocal airborne estimates. In the follow-up investigation, we will apply the trained SVR algorithms on real Sentinel-2 multispectral images acquired during vegetation seasons 2015 and 2016.

  20. A parameter estimation subroutine package

    NASA Technical Reports Server (NTRS)

    Bierman, G. J.; Nead, M. W.

    1978-01-01

    Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. A library of FORTRAN subroutines were developed to facilitate analyses of a variety of estimation problems. An easy to use, multi-purpose set of algorithms that are reasonably efficient and which use a minimal amount of computer storage are presented. Subroutine inputs, outputs, usage and listings are given, along with examples of how these routines can be used. The routines are compact and efficient and are far superior to the normal equation and Kalman filter data processing algorithms that are often used for least squares analyses.

  1. Computational tools for exact conditional logistic regression.

    PubMed

    Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P

    Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.

  2. Prediction of Water Quality Parameters Using Statistical Methods: A Case Study in a Specially Protected Area, Ankara, Turkey

    NASA Astrophysics Data System (ADS)

    Alp, E.; Yücel, Ö.; Özcan, Z.

    2014-12-01

    Turkey has been making many legal arrangements for sustainable water management during the harmonization process with the European Union. In order to make cost effective and efficient decisions, monitoring network in Turkey has been expanding. However, due to time and budget constraints, desired number of monitoring campaigns can not be carried. Hence, in this study, independent parameters that can be measured easily and quickly are used to estimate water quality parameters in Lake Mogan and Eymir using linear regression. Nonpoint sources are one of the major pollutant components in Eymir and Mogan lakes. In this paper, a correlation between easily measurable parameters, DO, temperature, electrical conductivity, pH, precipitation and dependent variables, TN, TP, COD, Chl-a, TSS, Total Coliform is investigated. Simple regression analysis is performed for each season in Eymir and Mogan lakes by using SPSS Statistical program using the water quality data collected between 2006-2012. Regression analysis demonstrated significant linear relationship between measured and simulated concentrations for TN (R2=0.86), TP (R2=0.85), TSS (R2=0.91), Chl-a (R2=0.94), COD (R2=0.99), T. Coliform (R2=0.97) which are the best results in each season for Eymir and Mogan Lakes. The overall results of this study shows that by using easily measurable parameters even in ungauged situation the water quality of lakes can be predicted. Moreover, the outputs obtained from the regression equations can be used as an input for water quality models such as phosphorus budget model which is used to calculate the required reduction in the external phosphorus load to Lake Mogan to meet the water quality standards.

  3. Estimation of stature using hand and foot dimensions in Slovak adults.

    PubMed

    Uhrová, Petra; Beňuš, Radoslav; Masnicová, Soňa; Obertová, Zuzana; Kramárová, Daniela; Kyselicová, Klaudia; Dörnhöferová, Michaela; Bodoriková, Silvia; Neščáková, Eva

    2015-03-01

    Hand and foot dimensions used for stature estimation help to formulate a biological profile in the process of personal identification. Morphological variability of hands and feet shows the importance of generating population-specific equations to estimate stature. The stature, hand length, hand breadth, foot length and foot breadth of 250 young Slovak males and females, aged 18-24 years, were measured according to standard anthropometric procedures. The data were statistically analyzed using independent t-test for sex and bilateral differences. Pearson correlation coefficient was used for assessing relationship between stature and hand/foot parameters, and subsequently linear regression analysis was used to estimate stature. The results revealed significant sex differences in hand and foot dimensions as well as in stature (p<0.05). There was a positive and statistically significant correlation between stature and all measurements in both sexes (p<0.01). The highest correlation coefficient was found for foot length in males (r=0.71) as well as in females (r=0.63). Regression equations were computed separately for each sex. The accuracy of stature prediction ranged from ±4.6 to ±6.1cm. The results of this study indicate that hand and foot dimension can be used to estimate stature for Slovak for the purpose of forensic field. The regression equations can be of use for stature estimation particularly in cases of dismembered bodies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  4. Linear theory for filtering nonlinear multiscale systems with model error

    PubMed Central

    Berry, Tyrus; Harlim, John

    2014-01-01

    In this paper, we study filtering of multiscale dynamical systems with model error arising from limitations in resolving the smaller scale processes. In particular, the analysis assumes the availability of continuous-time noisy observations of all components of the slow variables. Mathematically, this paper presents new results on higher order asymptotic expansion of the first two moments of a conditional measure. In particular, we are interested in the application of filtering multiscale problems in which the conditional distribution is defined over the slow variables, given noisy observation of the slow variables alone. From the mathematical analysis, we learn that for a continuous time linear model with Gaussian noise, there exists a unique choice of parameters in a linear reduced model for the slow variables which gives the optimal filtering when only the slow variables are observed. Moreover, these parameters simultaneously give the optimal equilibrium statistical estimates of the underlying system, and as a consequence they can be estimated offline from the equilibrium statistics of the true signal. By examining a nonlinear test model, we show that the linear theory extends in this non-Gaussian, nonlinear configuration as long as we know the optimal stochastic parametrization and the correct observation model. However, when the stochastic parametrization model is inappropriate, parameters chosen for good filter performance may give poor equilibrium statistical estimates and vice versa; this finding is based on analytical and numerical results on our nonlinear test model and the two-layer Lorenz-96 model. Finally, even when the correct stochastic ansatz is given, it is imperative to estimate the parameters simultaneously and to account for the nonlinear feedback of the stochastic parameters into the reduced filter estimates. In numerical experiments on the two-layer Lorenz-96 model, we find that the parameters estimated online, as part of a filtering procedure, simultaneously produce accurate filtering and equilibrium statistical prediction. In contrast, an offline estimation technique based on a linear regression, which fits the parameters to a training dataset without using the filter, yields filter estimates which are worse than the observations or even divergent when the slow variables are not fully observed. This finding does not imply that all offline methods are inherently inferior to the online method for nonlinear estimation problems, it only suggests that an ideal estimation technique should estimate all parameters simultaneously whether it is online or offline. PMID:25002829

  5. Species Composition at the Sub-Meter Level in Discontinuous Permafrost in Subarctic Sweden

    NASA Astrophysics Data System (ADS)

    Anderson, S. M.; Palace, M. W.; Layne, M.; Varner, R. K.; Crill, P. M.

    2013-12-01

    Northern latitudes are experiencing rapid warming. Wetlands underlain by permafrost are particularly vulnerable to warming which results in changes in vegetative cover. Specific species have been associated with greenhouse gas emissions therefore knowledge of species compositional shift allows for the systematic change and quantification of emissions and changes in such emissions. Species composition varies on the sub-meter scale based on topography and other microsite environmental parameters. This complexity and the need to scale vegetation to the landscape level proves vital in our estimation of carbon dioxide (CO2) and methane (CH4) emissions and dynamics. Stordalen Mire (68°21'N, 18°49'E) in Abisko and is located at the edge of discontinuous permafrost zone. This provides a unique opportunity to analyze multiple vegetation communities in a close proximity. To do this, we randomly selected 25 1x1 meter plots that were representative of five major cover types: Semi-wet, wet, hummock, tall graminoid, and tall shrub. We used a quadrat with 64 sub plots and measured areal percent cover for 24 species. We collected ground based remote sensing (RS) at each plot to determine species composition using an ADC-lite (near infrared, red, green) and GoPro (red, blue, green). We normalized each image based on a Teflon white chip placed in each image. Textural analysis was conducted on each image for entropy, angular second momentum, and lacunarity. A logistic regression was developed to examine vegetation cover types and remote sensing parameters. We used a multiple linear regression using forwards stepwise variable selection. We found statistical difference in species composition and diversity indices between vegetation cover types. In addition, we were able to build regression model to significantly estimate vegetation cover type as well as percent cover for specific key vegetative species. This ground-based remote sensing allows for quick quantification of vegetation cover and species and also provides the framework for scaling to satellite image data to estimate species composition and shift on the landscape level. To determine diversity within our plots we calculated species richness and Shannon Index. We found that there were statistically different species composition within each vegetation cover type and also determined which species were indicative for cover type. Our logistical regression was able to significantly classify vegetation cover types based on RS parameters. Our multiple regression analysis indicated Betunla nana (Dwarf Birch) (r2= .48, p=<0.0001) and Sphagnum (r2=0.59, p=<0.0001) were statistically significant with respect to RS parameters. We suggest that ground based remote sensing methods may provide a unique and efficient method to quantify vegetation across the landscape in northern latitude wetlands.

  6. In vivo imaging of scattering and absorption properties of exposed brain using a digital red-green-blue camera

    NASA Astrophysics Data System (ADS)

    Nishidate, Izumi; Yoshida, Keiichiro; Kawauchi, Satoko; Sato, Shunichi; Sato, Manabu

    2014-03-01

    We investigate a method to estimate the spectral images of reduced scattering coefficients and the absorption coefficients of in vivo exposed brain tissues in the range from visible to near-infrared wavelength (500-760 nm) based on diffuse reflectance spectroscopy using a digital RGB camera. In the proposed method, the multi-spectral reflectance images of in vivo exposed brain are reconstructed from the digital red, green blue images using the Wiener estimation algorithm. The Monte Carlo simulation-based multiple regression analysis for the absorbance spectra is then used to specify the absorption and scattering parameters of brain tissue. In this analysis, the concentration of oxygenated hemoglobin and that of deoxygenated hemoglobin are estimated as the absorption parameters whereas the scattering amplitude a and the scattering power b in the expression of μs'=aλ-b as the scattering parameters, respectively. The spectra of absorption and reduced scattering coefficients are reconstructed from the absorption and scattering parameters, and finally, the spectral images of absorption and reduced scattering coefficients are estimated. The estimated images of absorption coefficients were dominated by the spectral characteristics of hemoglobin. The estimated spectral images of reduced scattering coefficients showed a broad scattering spectrum, exhibiting larger magnitude at shorter wavelengths, corresponding to the typical spectrum of brain tissue published in the literature. In vivo experiments with exposed brain of rats during CSD confirmed the possibility of the method to evaluate both hemodynamics and changes in tissue morphology due to electrical depolarization.

  7. Gap-filling methods to impute eddy covariance flux data by preserving variance.

    NASA Astrophysics Data System (ADS)

    Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.

    2015-12-01

    To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.

  8. [RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

    PubMed

    Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

    2017-10-01

    By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.

  9. Predicting seasonal influenza transmission using functional regression models with temporal dependence.

    PubMed

    Oviedo de la Fuente, Manuel; Febrero-Bande, Manuel; Muñoz, María Pilar; Domínguez, Àngela

    2018-01-01

    This paper proposes a novel approach that uses meteorological information to predict the incidence of influenza in Galicia (Spain). It extends the Generalized Least Squares (GLS) methods in the multivariate framework to functional regression models with dependent errors. These kinds of models are useful when the recent history of the incidence of influenza are readily unavailable (for instance, by delays on the communication with health informants) and the prediction must be constructed by correcting the temporal dependence of the residuals and using more accessible variables. A simulation study shows that the GLS estimators render better estimations of the parameters associated with the regression model than they do with the classical models. They obtain extremely good results from the predictive point of view and are competitive with the classical time series approach for the incidence of influenza. An iterative version of the GLS estimator (called iGLS) was also proposed that can help to model complicated dependence structures. For constructing the model, the distance correlation measure [Formula: see text] was employed to select relevant information to predict influenza rate mixing multivariate and functional variables. These kinds of models are extremely useful to health managers in allocating resources in advance to manage influenza epidemics.

  10. Regression Analysis of Mixed Panel Count Data with Dependent Terminal Events

    PubMed Central

    Yu, Guanglei; Zhu, Liang; Li, Yang; Sun, Jianguo; Robison, Leslie L.

    2017-01-01

    Event history studies are commonly conducted in many fields and a great deal of literature has been established for the analysis of the two types of data commonly arising from these studies: recurrent event data and panel count data. The former arises if all study subjects are followed continuously, while the latter means that each study subject is observed only at discrete time points. In reality, a third type of data, a mixture of the two types of the data above, may occur and furthermore, as with the first two types of the data, there may exist a dependent terminal event, which may preclude the occurrences of recurrent events of interest. This paper discusses regression analysis of mixed recurrent event and panel count data in the presence of a terminal event and an estimating equation-based approach is proposed for estimation of regression parameters of interest. In addition, the asymptotic properties of the proposed estimator are established and a simulation study conducted to assess the finite-sample performance of the proposed method suggests that it works well in practical situations. Finally the methodology is applied to a childhood cancer study that motivated this study. PMID:28098397

  11. Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.

    PubMed

    Samoli, Evangelia; Butland, Barbara K

    2017-12-01

    Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.

  12. RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

    PubMed

    Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

    2009-01-01

    This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).

  13. Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.

    PubMed

    Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H

    2010-02-01

    Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  14. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  15. Model-assisted survey regression estimation with the lasso

    Treesearch

    Kelly S. McConville; F. Jay Breidt; Thomas C. M. Lee; Gretchen G. Moisen

    2017-01-01

    In the U.S. Forest Service’s Forest Inventory and Analysis (FIA) program, as in other natural resource surveys, many auxiliary variables are available for use in model-assisted inference about finite population parameters. Some of this auxiliary information may be extraneous, and therefore model selection is appropriate to improve the efficiency of the survey...

  16. Accounting for Slipping and Other False Negatives in Logistic Models of Student Learning

    ERIC Educational Resources Information Center

    MacLellan, Christopher J.; Liu, Ran; Koedinger, Kenneth R.

    2015-01-01

    Additive Factors Model (AFM) and Performance Factors Analysis (PFA) are two popular models of student learning that employ logistic regression to estimate parameters and predict performance. This is in contrast to Bayesian Knowledge Tracing (BKT) which uses a Hidden Markov Model formalism. While all three models tend to make similar predictions,…

  17. A comparative look at sunspot cycles

    NASA Technical Reports Server (NTRS)

    Wilson, R. M.

    1984-01-01

    On the basis of cycles 8 through 20, spanning about 143 years, observations of sunspot number, smoothed sunspot number, and their temporal properties were used to compute means, standard deviations, ranges, and frequency of occurrence histograms for a number of sunspot cycle parameters. The resultant schematic sunspot cycle was contrasted with the mean sunspot cycle, obtained by averaging smoothed sunspot number as a function of time, tying all cycles (8 through 20) to their minimum occurence date. A relatively good approximation of the time variation of smoothed sunspot number for a given cycle is possible if sunspot cycles are regarded in terms of being either HIGH- or LOW-R(MAX) cycles or LONG- or SHORT-PERIOD cycles, especially the latter. Linear regression analyses were performed comparing late cycle parameters with early cycle parameters and solar cycle number. The early occurring cycle parameters can be used to estimate later occurring cycle parameters with relatively good success, based on cycle 21 as an example. The sunspot cycle record clearly shows that the trend for both R(MIN) and R(MAX) was toward decreasing value between cycles 8 through 14 and toward increasing value between cycles 14 through 20. Linear regression equations were also obtained for several measures of solar activity.

  18. Development of Jet Noise Power Spectral Laws Using SHJAR Data

    NASA Technical Reports Server (NTRS)

    Khavaran, Abbas; Bridges, James

    2009-01-01

    High quality jet noise spectral data measured at the Aeroacoustic Propulsion Laboratory at the NASA Glenn Research Center is used to examine a number of jet noise scaling laws. Configurations considered in the present study consist of convergent and convergent-divergent axisymmetric nozzles. Following the work of Viswanathan, velocity power factors are estimated using a least squares fit on spectral power density as a function of jet temperature and observer angle. The regression parameters are scrutinized for their uncertainty within the desired confidence margins. As an immediate application of the velocity power laws, spectral density in supersonic jets are decomposed into their respective components attributed to the jet mixing noise and broadband shock associated noise. Subsequent application of the least squares method on the shock power intensity shows that the latter also scales with some power of the shock parameter. A modified shock parameter is defined in order to reduce the dependency of the regression factors on the nozzle design point within the uncertainty margins of the least squares method.

  19. Multivariate meta-analysis for non-linear and other multi-parameter associations

    PubMed Central

    Gasparrini, A; Armstrong, B; Kenward, M G

    2012-01-01

    In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043

  20. Local linear estimation of concordance probability with application to covariate effects models on association for bivariate failure-time data.

    PubMed

    Ding, Aidong Adam; Hsieh, Jin-Jian; Wang, Weijing

    2015-01-01

    Bivariate survival analysis has wide applications. In the presence of covariates, most literature focuses on studying their effects on the marginal distributions. However covariates can also affect the association between the two variables. In this article we consider the latter issue by proposing a nonstandard local linear estimator for the concordance probability as a function of covariates. Under the Clayton copula, the conditional concordance probability has a simple one-to-one correspondence with the copula parameter for different data structures including those subject to independent or dependent censoring and dependent truncation. The proposed method can be used to study how covariates affect the Clayton association parameter without specifying marginal regression models. Asymptotic properties of the proposed estimators are derived and their finite-sample performances are examined via simulations. Finally, for illustration, we apply the proposed method to analyze a bone marrow transplant data set.

  1. Genetic Analysis of Milk Yield in First-Lactation Holstein Friesian in Ethiopia: A Lactation Average vs Random Regression Test-Day Model Analysis

    PubMed Central

    Meseret, S.; Tamir, B.; Gebreyohannes, G.; Lidauer, M.; Negussie, E.

    2015-01-01

    The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations. PMID:26194217

  2. Probable flood predictions in ungauged coastal basins of El Salvador

    USGS Publications Warehouse

    Friedel, M.J.; Smith, M.E.; Chica, A.M.E.; Litke, D.

    2008-01-01

    A regionalization procedure is presented and used to predict probable flooding in four ungauged coastal river basins of El Salvador: Paz, Jiboa, Grande de San Miguel, and Goascoran. The flood-prediction problem is sequentially solved for two regions: upstream mountains and downstream alluvial plains. In the upstream mountains, a set of rainfall-runoff parameter values and recurrent peak-flow discharge hydrographs are simultaneously estimated for 20 tributary-basin models. Application of dissimilarity equations among tributary basins (soft prior information) permitted development of a parsimonious parameter structure subject to information content in the recurrent peak-flow discharge values derived using regression equations based on measurements recorded outside the ungauged study basins. The estimated joint set of parameter values formed the basis from which probable minimum and maximum peak-flow discharge limits were then estimated revealing that prediction uncertainty increases with basin size. In the downstream alluvial plain, model application of the estimated minimum and maximum peak-flow hydrographs facilitated simulation of probable 100-year flood-flow depths in confined canyons and across unconfined coastal alluvial plains. The regionalization procedure provides a tool for hydrologic risk assessment and flood protection planning that is not restricted to the case presented herein. ?? 2008 ASCE.

  3. An initial-abstraction, constant-loss model for unit hydrograph modeling for applicable watersheds in Texas

    USGS Publications Warehouse

    Asquith, William H.; Roussel, Meghan C.

    2007-01-01

    Estimation of representative hydrographs from design storms, which are known as design hydrographs, provides for cost-effective, riskmitigated design of drainage structures such as bridges, culverts, roadways, and other infrastructure. During 2001?07, the U.S. Geological Survey (USGS), in cooperation with the Texas Department of Transportation, investigated runoff hydrographs, design storms, unit hydrographs,and watershed-loss models to enhance design hydrograph estimation in Texas. Design hydrographs ideally should mimic the general volume, peak, and shape of observed runoff hydrographs. Design hydrographs commonly are estimated in part by unit hydrographs. A unit hydrograph is defined as the runoff hydrograph that results from a unit pulse of excess rainfall uniformly distributed over the watershed at a constant rate for a specific duration. A time-distributed, watershed-loss model is required for modeling by unit hydrographs. This report develops a specific time-distributed, watershed-loss model known as an initial-abstraction, constant-loss model. For this watershed-loss model, a watershed is conceptualized to have the capacity to store or abstract an absolute depth of rainfall at and near the beginning of a storm. Depths of total rainfall less than this initial abstraction do not produce runoff. The watershed also is conceptualized to have the capacity to remove rainfall at a constant rate (loss) after the initial abstraction is satisfied. Additional rainfall inputs after the initial abstraction is satisfied contribute to runoff if the rainfall rate (intensity) is larger than the constant loss. The initial abstraction, constant-loss model thus is a two-parameter model. The initial-abstraction, constant-loss model is investigated through detailed computational and statistical analysis of observed rainfall and runoff data for 92 USGS streamflow-gaging stations (watersheds) in Texas with contributing drainage areas from 0.26 to 166 square miles. The analysis is limited to a previously described, watershed-specific, gamma distribution model of the unit hydrograph. In particular, the initial-abstraction, constant-loss model is tuned to the gamma distribution model of the unit hydrograph. A complex computational analysis of observed rainfall and runoff for the 92 watersheds was done to determine, by storm, optimal values of initial abstraction and constant loss. Optimal parameter values for a given storm were defined as those values that produced a modeled runoff hydrograph with volume equal to the observed runoff hydrograph and also minimized the residual sum of squares of the two hydrographs. Subsequently, the means of the optimal parameters were computed on a watershed-specific basis. These means for each watershed are considered the most representative, are tabulated, and are used in further statistical analyses. Statistical analyses of watershed-specific, initial abstraction and constant loss include documentation of the distribution of each parameter using the generalized lambda distribution. The analyses show that watershed development has substantial influence on initial abstraction and limited influence on constant loss. The means and medians of the 92 watershed-specific parameters are tabulated with respect to watershed development; although they have considerable uncertainty, these parameters can be used for parameter prediction for ungaged watersheds. The statistical analyses of watershed-specific, initial abstraction and constant loss also include development of predictive procedures for estimation of each parameter for ungaged watersheds. Both regression equations and regression trees for estimation of initial abstraction and constant loss are provided. The watershed characteristics included in the regression analyses are (1) main-channel length, (2) a binary factor representing watershed development, (3) a binary factor representing watersheds with an abundance of rocky and thin-soiled terrain, and (4) curve numb

  4. Enhanced ID Pit Sizing Using Multivariate Regression Algorithm

    NASA Astrophysics Data System (ADS)

    Krzywosz, Kenji

    2007-03-01

    EPRI is funding a program to enhance and improve the reliability of inside diameter (ID) pit sizing for balance-of plant heat exchangers, such as condensers and component cooling water heat exchangers. More traditional approaches to ID pit sizing involve the use of frequency-specific amplitude or phase angles. The enhanced multivariate regression algorithm for ID pit depth sizing incorporates three simultaneous input parameters of frequency, amplitude, and phase angle. A set of calibration data sets consisting of machined pits of various rounded and elongated shapes and depths was acquired in the frequency range of 100 kHz to 1 MHz for stainless steel tubing having nominal wall thickness of 0.028 inch. To add noise to the acquired data set, each test sample was rotated and test data acquired at 3, 6, 9, and 12 o'clock positions. The ID pit depths were estimated using a second order and fourth order regression functions by relying on normalized amplitude and phase angle information from multiple frequencies. Due to unique damage morphology associated with the microbiologically-influenced ID pits, it was necessary to modify the elongated calibration standard-based algorithms by relying on the algorithm developed solely from the destructive sectioning results. This paper presents the use of transformed multivariate regression algorithm to estimate ID pit depths and compare the results with the traditional univariate phase angle analysis. Both estimates were then compared with the destructive sectioning results.

  5. Model parameter estimation approach based on incremental analysis for lithium-ion batteries without using open circuit voltage

    NASA Astrophysics Data System (ADS)

    Wu, Hongjie; Yuan, Shifei; Zhang, Xi; Yin, Chengliang; Ma, Xuerui

    2015-08-01

    To improve the suitability of lithium-ion battery model under varying scenarios, such as fluctuating temperature and SoC variation, dynamic model with parameters updated realtime should be developed. In this paper, an incremental analysis-based auto regressive exogenous (I-ARX) modeling method is proposed to eliminate the modeling error caused by the OCV effect and improve the accuracy of parameter estimation. Then, its numerical stability, modeling error, and parametric sensitivity are analyzed at different sampling rates (0.02, 0.1, 0.5 and 1 s). To identify the model parameters recursively, a bias-correction recursive least squares (CRLS) algorithm is applied. Finally, the pseudo random binary sequence (PRBS) and urban dynamic driving sequences (UDDSs) profiles are performed to verify the realtime performance and robustness of the newly proposed model and algorithm. Different sampling rates (1 Hz and 10 Hz) and multiple temperature points (5, 25, and 45 °C) are covered in our experiments. The experimental and simulation results indicate that the proposed I-ARX model can present high accuracy and suitability for parameter identification without using open circuit voltage.

  6. Using Simulation Technique to overcome the multi-collinearity problem for estimating fuzzy linear regression parameters.

    NASA Astrophysics Data System (ADS)

    Mansoor Gorgees, Hazim; Hilal, Mariam Mohammed

    2018-05-01

    Fatigue cracking is one of the common types of pavement distresses and is an indicator of structural failure; cracks allow moisture infiltration, roughness, may further deteriorate to a pothole. Some causes of pavement deterioration are: traffic loading; environment influences; drainage deficiencies; materials quality problems; construction deficiencies and external contributors. Many researchers have made models that contain many variables like asphalt content, asphalt viscosity, fatigue life, stiffness of asphalt mixture, temperature and other parameters that affect the fatigue life. For this situation, a fuzzy linear regression model was employed and analyzed by using the traditional methods and our proposed method in order to overcome the multi-collinearity problem. The total spread error was used as a criterion to compare the performance of the studied methods. Simulation program was used to obtain the required results.

  7. Diffusion parameter mapping with the combined intravoxel incoherent motion and kurtosis model using artificial neural networks at 3 T.

    PubMed

    Bertleff, Marco; Domsch, Sebastian; Weingärtner, Sebastian; Zapp, Jascha; O'Brien, Kieran; Barth, Markus; Schad, Lothar R

    2017-12-01

    Artificial neural networks (ANNs) were used for voxel-wise parameter estimation with the combined intravoxel incoherent motion (IVIM) and kurtosis model facilitating robust diffusion parameter mapping in the human brain. The proposed ANN approach was compared with conventional least-squares regression (LSR) and state-of-the-art multi-step fitting (LSR-MS) in Monte-Carlo simulations and in vivo in terms of estimation accuracy and precision, number of outliers and sensitivity in the distinction between grey (GM) and white (WM) matter. Both the proposed ANN approach and LSR-MS yielded visually increased parameter map quality. Estimations of all parameters (perfusion fraction f, diffusion coefficient D, pseudo-diffusion coefficient D*, kurtosis K) were in good agreement with the literature using ANN, whereas LSR-MS resulted in D* overestimation and LSR yielded increased values for f and D*, as well as decreased values for K. Using ANN, outliers were reduced for the parameters f (ANN, 1%; LSR-MS, 19%; LSR, 8%), D* (ANN, 21%; LSR-MS, 25%; LSR, 23%) and K (ANN, 0%; LSR-MS, 0%; LSR, 15%). Moreover, ANN enabled significant distinction between GM and WM based on all parameters, whereas LSR facilitated this distinction only based on D and LSR-MS on f, D and K. Overall, the proposed ANN approach was found to be superior to conventional LSR, posing a powerful alternative to the state-of-the-art method LSR-MS with several advantages in the estimation of IVIM-kurtosis parameters, which might facilitate increased applicability of enhanced diffusion models at clinical scan times. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Chloramine demand estimation using surrogate chemical and microbiological parameters.

    PubMed

    Moradi, Sina; Liu, Sanly; Chow, Christopher W K; van Leeuwen, John; Cook, David; Drikas, Mary; Amal, Rose

    2017-07-01

    A model is developed to enable estimation of chloramine demand in full scale drinking water supplies based on chemical and microbiological factors that affect chloramine decay rate via nonlinear regression analysis method. The model is based on organic character (specific ultraviolet absorbance (SUVA)) of the water samples and a laboratory measure of the microbiological (F m ) decay of chloramine. The applicability of the model for estimation of chloramine residual (and hence chloramine demand) was tested on several waters from different water treatment plants in Australia through statistical test analysis between the experimental and predicted data. Results showed that the model was able to simulate and estimate chloramine demand at various times in real drinking water systems. To elucidate the loss of chloramine over the wide variation of water quality used in this study, the model incorporates both the fast and slow chloramine decay pathways. The significance of estimated fast and slow decay rate constants as the kinetic parameters of the model for three water sources in Australia was discussed. It was found that with the same water source, the kinetic parameters remain the same. This modelling approach has the potential to be used by water treatment operators as a decision support tool in order to manage chloramine disinfection. Copyright © 2017. Published by Elsevier B.V.

  9. Summary of groundwater-recharge estimates for Pennsylvania

    USGS Publications Warehouse

    Stuart O. Reese,; Risser, Dennis W.

    2010-01-01

    Groundwater recharge is water that infiltrates through the subsurface to the zone of saturation beneath the water table. Because recharge is a difficult parameter to quantify, it is typically estimated from measurements of other parameters like streamflow and precipitation. This report provides a general overview of processes affecting recharge in Pennsylvania and presents estimates of recharge rates from studies at various scales.The most common method for estimating recharge in Pennsylvania has been to estimate base flow from measurements of streamflow and assume that base flow (expressed in inches over the basin) approximates recharge. Statewide estimates of mean annual groundwater recharge were developed by relating base flow to basin characteristics of HUC10 watersheds (a fifth-level classification that uses 10 digits to define unique hydrologic units) using a regression equation. The regression analysis indicated that mean annual precipitation, average daily maximum temperature, percent of sand in soil, percent of carbonate rock in the watershed, and average stream-channel slope were significant factors in the explaining the variability of groundwater recharge across the Commonwealth.Several maps are included in this report to illustrate the principal factors affecting recharge and provide additional information about the spatial distribution of recharge in Pennsylvania. The maps portray the patterns of precipitation, temperature, prevailing winds across Pennsylvania’s varied physiography; illustrate the error associated with recharge estimates; and show the spatial variability of recharge as a percent of precipitation. National, statewide, regional, and local values of recharge, based on numerous studies, are compiled to allow comparison of estimates from various sources. Together these plates provide a synopsis of groundwater-recharge estimations and factors in Pennsylvania.Areas that receive the most recharge are typically those that get the most rainfall, have favorable surface conditions for infiltration, and are less susceptible to the influences of high temperatures, and thus, evapotranspiration. Areas that have less recharge in Pennsylvania are typically those with less precipitation, less permeable soils, and higher temperatures that are conducive to greater rates of evapotranspiration.

  10. Estimates of Ground Temperature and Atmospheric Moisture from CERES Observations

    NASA Technical Reports Server (NTRS)

    Wu, Man Li C.; Schubert, Siegfried; Einaudi, Franco (Technical Monitor)

    2000-01-01

    A method is developed to retrieve surface ground temperature (Tg) and atmospheric moisture using clear sky fluxes (CSF) from CERES-TRMM observations. In general, the clear sky outgoing long-wave radiation (CLR) is sensitive to upper level moisture (q(sub h)) over wet regions and Tg over dry regions The clear sky window flux from 800 to 1200 /cm (RadWn) is sensitive to low level moisture (q(sub j)) and Tg. Combining these two measurements (CLR and RadWn), Tg and q(sub h) can be estimated over land, while q(sub h) and q(sub t) can be estimated over the oceans. The approach capitalizes on the availability of satellite estimates of CLR and RadWn and other auxiliary satellite data. The basic methodology employs off-line forward radiative transfer calculations to generate synthetic CSF data from two different global 4-dimensional data assimilation products. Simple linear regression is used to relate discrepancies in CSF to discrepancies in Tg, q(sub h) and q(sub t). The slopes of the regression lines define sensitivity parameters that can be exploited to help interpret mismatches between satellite observations and model-based estimates of CSF. For illustration, we analyze the discrepancies in the CSF between an early implementation of the Goddard Earth Observing System Data Assimilation System (GEOS-DAS) and a recent operational version of the European Center for Medium-Range Weather Prediction data assimilation system. In particular, our analysis of synthetic total and window region SCF differences (computed from two different assimilated data sets) shows that simple linear regression employing (Delta)Tg and broad layer (Delta)q(sub l) from 500 hPa to surface and (Delta)q(sub h) from 200 to 500 hPa provides a good approximation to the full radiative transfer calculations, typically explaining more than 90% of the 6-hourly variance in the flux differences. These simple regression relations can be inverted to "retrieve" the errors in the geophysical parameters. Uncertainties (normalized by standard deviation) in the monthly mean retrieved parameters range from 7% for (Delta)T to about 20% for (Delta)q(sub t). Our initial application of the methodology employed an early CERES-TRMM data set (CLR and Radwn) to assess the quality of the GEOS2 data. The results showed that over the tropical and subtropical oceans GEOS2 is, in general, too wet in the upper troposphere (mean bias of 0.99 mm) and too dry in the lower troposphere (mean bias of -4.7 mm). We note that these errors, as well as a cold bias in the Tg, have largely been corrected in the current version of GEOS-2 with the introduction of a land surface model, a moist turbulence scheme and the assimilation of SSTM/I total precipitable water.

  11. Model parameter uncertainty analysis for an annual field-scale P loss model

    NASA Astrophysics Data System (ADS)

    Bolster, Carl H.; Vadas, Peter A.; Boykin, Debbie

    2016-08-01

    Phosphorous (P) fate and transport models are important tools for developing and evaluating conservation practices aimed at reducing P losses from agricultural fields. Because all models are simplifications of complex systems, there will exist an inherent amount of uncertainty associated with their predictions. It is therefore important that efforts be directed at identifying, quantifying, and communicating the different sources of model uncertainties. In this study, we conducted an uncertainty analysis with the Annual P Loss Estimator (APLE) model. Our analysis included calculating parameter uncertainties and confidence and prediction intervals for five internal regression equations in APLE. We also estimated uncertainties of the model input variables based on values reported in the literature. We then predicted P loss for a suite of fields under different management and climatic conditions while accounting for uncertainties in the model parameters and inputs and compared the relative contributions of these two sources of uncertainty to the overall uncertainty associated with predictions of P loss. Both the overall magnitude of the prediction uncertainties and the relative contributions of the two sources of uncertainty varied depending on management practices and field characteristics. This was due to differences in the number of model input variables and the uncertainties in the regression equations associated with each P loss pathway. Inspection of the uncertainties in the five regression equations brought attention to a previously unrecognized limitation with the equation used to partition surface-applied fertilizer P between leaching and runoff losses. As a result, an alternate equation was identified that provided similar predictions with much less uncertainty. Our results demonstrate how a thorough uncertainty and model residual analysis can be used to identify limitations with a model. Such insight can then be used to guide future data collection and model development and evaluation efforts.

  12. Linearly Supporting Feature Extraction for Automated Estimation of Stellar Atmospheric Parameters

    NASA Astrophysics Data System (ADS)

    Li, Xiangru; Lu, Yu; Comte, Georges; Luo, Ali; Zhao, Yongheng; Wang, Yongjun

    2015-05-01

    We describe a scheme to extract linearly supporting (LSU) features from stellar spectra to automatically estimate the atmospheric parameters {{T}{\\tt{eff} }}, log g, and [Fe/H]. “Linearly supporting” means that the atmospheric parameters can be accurately estimated from the extracted features through a linear model. The successive steps of the process are as follow: first, decompose the spectrum using a wavelet packet (WP) and represent it by the derived decomposition coefficients; second, detect representative spectral features from the decomposition coefficients using the proposed method Least Absolute Shrinkage and Selection Operator (LARS)bs; third, estimate the atmospheric parameters {{T}{\\tt{eff} }}, log g, and [Fe/H] from the detected features using a linear regression method. One prominent characteristic of this scheme is its ability to evaluate quantitatively the contribution of each detected feature to the atmospheric parameter estimate and also to trace back the physical significance of that feature. This work also shows that the usefulness of a component depends on both the wavelength and frequency. The proposed scheme has been evaluated on both real spectra from the Sloan Digital Sky Survey (SDSS)/SEGUE and synthetic spectra calculated from Kurucz's NEWODF models. On real spectra, we extracted 23 features to estimate {{T}{\\tt{eff} }}, 62 features for log g, and 68 features for [Fe/H]. Test consistencies between our estimates and those provided by the Spectroscopic Parameter Pipeline of SDSS show that the mean absolute errors (MAEs) are 0.0062 dex for log {{T}{\\tt{eff} }} (83 K for {{T}{\\tt{eff} }}), 0.2345 dex for log g, and 0.1564 dex for [Fe/H]. For the synthetic spectra, the MAE test accuracies are 0.0022 dex for log {{T}{\\tt{eff} }} (32 K for {{T}{\\tt{eff} }}), 0.0337 dex for log g, and 0.0268 dex for [Fe/H].

  13. Estimation of genetic effects in the presence of multicollinearity in multibreed beef cattle evaluation.

    PubMed

    Roso, V M; Schenkel, F S; Miller, S P; Schaeffer, L R

    2005-08-01

    Breed additive, dominance, and epistatic loss effects are of concern in the genetic evaluation of a multibreed population. Multiple regression equations used for fitting these effects may show a high degree of multicollinearity among predictor variables. Typically, when strong linear relationships exist, the regression coefficients have large SE and are sensitive to changes in the data file and to the addition or deletion of variables in the model. Generalized ridge regression methods were applied to obtain stable estimates of direct and maternal breed additive, dominance, and epistatic loss effects in the presence of multicollinearity among predictor variables. Preweaning weight gains of beef calves in Ontario, Canada, from 1986 to 1999 were analyzed. The genetic model included fixed direct and maternal breed additive, dominance, and epistatic loss effects, fixed environmental effects of age of the calf, contemporary group, and age of the dam x sex of the calf, random additive direct and maternal genetic effects, and random maternal permanent environment effect. The degree and the nature of the multicollinearity were identified and ridge regression methods were used as an alternative to ordinary least squares (LS). Ridge parameters were obtained using two different objective methods: 1) generalized ridge estimator of Hoerl and Kennard (R1); and 2) bootstrap in combination with cross-validation (R2). Both ridge regression methods outperformed the LS estimator with respect to mean squared error of predictions (MSEP) and variance inflation factors (VIF) computed over 100 bootstrap samples. The MSEP of R1 and R2 were similar, and they were 3% less than the MSEP of LS. The average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively. Ridge regression methods were particularly effective in decreasing the multicollinearity involving predictor variables of breed additive effects. Because of a high degree of confounding between estimates of maternal dominance and direct epistatic loss effects, it was not possible to compare the relative importance of these effects with a high level of confidence. The inclusion of epistatic loss effects in the additive-dominance model did not cause noticeable reranking of sires, dams, and calves based on across-breed EBV. More precise estimates of breed effects as a result of this study may result in more stable across-breed estimated breeding values over the years.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Butler, W.J.; Kalasinski, L.A.

    In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average numbermore » of pregnancies contributed by each woman.« less

  15. Practical aspects of estimating energy components in rodents

    PubMed Central

    van Klinken, Jan B.; van den Berg, Sjoerd A. A.; van Dijk, Ko Willems

    2013-01-01

    Recently there has been an increasing interest in exploiting computational and statistical techniques for the purpose of component analysis of indirect calorimetry data. Using these methods it becomes possible to dissect daily energy expenditure into its components and to assess the dynamic response of the resting metabolic rate (RMR) to nutritional and pharmacological manipulations. To perform robust component analysis, however, is not straightforward and typically requires the tuning of parameters and the preprocessing of data. Moreover the degree of accuracy that can be attained by these methods depends on the configuration of the system, which must be properly taken into account when setting up experimental studies. Here, we review the methods of Kalman filtering, linear, and penalized spline regression, and minimal energy expenditure estimation in the context of component analysis and discuss their results on high resolution datasets from mice and rats. In addition, we investigate the effect of the sample time, the accuracy of the activity sensor, and the washout time of the chamber on the estimation accuracy. We found that on the high resolution data there was a strong correlation between the results of Kalman filtering and penalized spline (P-spline) regression, except for the activity respiratory quotient (RQ). For low resolution data the basal metabolic rate (BMR) and resting RQ could still be estimated accurately with P-spline regression, having a strong correlation with the high resolution estimate (R2 > 0.997; sample time of 9 min). In contrast, the thermic effect of food (TEF) and activity related energy expenditure (AEE) were more sensitive to a reduction in the sample rate (R2 > 0.97). In conclusion, for component analysis on data generated by single channel systems with continuous data acquisition both Kalman filtering and P-spline regression can be used, while for low resolution data from multichannel systems P-spline regression gives more robust results. PMID:23641217

  16. Homogeneity Pursuit

    PubMed Central

    Ke, Tracy; Fan, Jianqing; Wu, Yichao

    2014-01-01

    This paper explores the homogeneity of coefficients in high-dimensional regression, which extends the sparsity concept and is more general and suitable for many applications. Homogeneity arises when regression coefficients corresponding to neighboring geographical regions or a similar cluster of covariates are expected to be approximately the same. Sparsity corresponds to a special case of homogeneity with a large cluster of known atom zero. In this article, we propose a new method called clustering algorithm in regression via data-driven segmentation (CARDS) to explore homogeneity. New mathematics are provided on the gain that can be achieved by exploring homogeneity. Statistical properties of two versions of CARDS are analyzed. In particular, the asymptotic normality of our proposed CARDS estimator is established, which reveals better estimation accuracy for homogeneous parameters than that without homogeneity exploration. When our methods are combined with sparsity exploration, further efficiency can be achieved beyond the exploration of sparsity alone. This provides additional insights into the power of exploring low-dimensional structures in high-dimensional regression: homogeneity and sparsity. Our results also shed lights on the properties of the fussed Lasso. The newly developed method is further illustrated by simulation studies and applications to real data. Supplementary materials for this article are available online. PMID:26085701

  17. The ice age cycle and the deglaciations: an application of nonlinear regression modelling

    NASA Astrophysics Data System (ADS)

    Dalgleish, A. N.; Boulton, G. S.; Renshaw, E.

    2000-03-01

    We have applied the nonlinear regression technique known as additivity and variance stabilisation (AVAS) to time series which reflect Earth's climate over the last 600 ka. AVAS estimates a smooth, nonlinear transform for each variable, under the assumption of an additive model. The Earth's orbital parameters and insolation variations have been used as regression variables. Analysis of the contribution of each variable shows that the deglaciations are characterised by periods of increasing obliquity and perihelion approaching the vernal equinox, but not by any systematic change in eccentricity. The magnitude of insolation changes also plays no role. By approximating the transforms we can obtain a future prediction, with a glacial maximum at 60 ka AP, and a subsequent obliquity and precession forced deglaciation.

  18. Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.

    PubMed

    Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo

    2016-01-01

    In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.

  19. How the 2SLS/IV estimator can handle equality constraints in structural equation models: a system-of-equations approach.

    PubMed

    Nestler, Steffen

    2014-05-01

    Parameters in structural equation models are typically estimated using the maximum likelihood (ML) approach. Bollen (1996) proposed an alternative non-iterative, equation-by-equation estimator that uses instrumental variables. Although this two-stage least squares/instrumental variables (2SLS/IV) estimator has good statistical properties, one problem with its application is that parameter equality constraints cannot be imposed. This paper presents a mathematical solution to this problem that is based on an extension of the 2SLS/IV approach to a system of equations. We present an example in which our approach was used to examine strong longitudinal measurement invariance. We also investigated the new approach in a simulation study that compared it with ML in the examination of the equality of two latent regression coefficients and strong measurement invariance. Overall, the results show that the suggested approach is a useful extension of the original 2SLS/IV estimator and allows for the effective handling of equality constraints in structural equation models. © 2013 The British Psychological Society.

  20. Assessment and modeling of the groundwater hydrogeochemical quality parameters via geostatistical approaches

    NASA Astrophysics Data System (ADS)

    Karami, Shawgar; Madani, Hassan; Katibeh, Homayoon; Fatehi Marj, Ahmad

    2018-03-01

    Geostatistical methods are one of the advanced techniques used for interpolation of groundwater quality data. The results obtained from geostatistics will be useful for decision makers to adopt suitable remedial measures to protect the quality of groundwater sources. Data used in this study were collected from 78 wells in Varamin plain aquifer located in southeast of Tehran, Iran, in 2013. Ordinary kriging method was used in this study to evaluate groundwater quality parameters. According to what has been mentioned in this paper, seven main quality parameters (i.e. total dissolved solids (TDS), sodium adsorption ratio (SAR), electrical conductivity (EC), sodium (Na+), total hardness (TH), chloride (Cl-) and sulfate (SO4 2-)), have been analyzed and interpreted by statistical and geostatistical methods. After data normalization by Nscore method in WinGslib software, variography as a geostatistical tool to define spatial regression was compiled and experimental variograms were plotted by GS+ software. Then, the best theoretical model was fitted to each variogram based on the minimum RSS. Cross validation method was used to determine the accuracy of the estimated data. Eventually, estimation maps of groundwater quality were prepared in WinGslib software and estimation variance map and estimation error map were presented to evaluate the quality of estimation in each estimated point. Results showed that kriging method is more accurate than the traditional interpolation methods.

  1. Methods for determining magnitude and frequency of floods in California, based on data through water year 2006

    USGS Publications Warehouse

    Gotvald, Anthony J.; Barth, Nancy A.; Veilleux, Andrea G.; Parrett, Charles

    2012-01-01

    Methods for estimating the magnitude and frequency of floods in California that are not substantially affected by regulation or diversions have been updated. Annual peak-flow data through water year 2006 were analyzed for 771 streamflow-gaging stations (streamgages) in California having 10 or more years of data. Flood-frequency estimates were computed for the streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to logarithms of annual peak flows for each streamgage. Low-outlier and historic information were incorporated into the flood-frequency analysis, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low outliers. Special methods for fitting the distribution were developed for streamgages in the desert region in southeastern California. Additionally, basin characteristics for the streamgages were computed by using a geographical information system. Regional regression analysis, using generalized least squares regression, was used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins in California that are outside of the southeastern desert region. Flood-frequency estimates and basin characteristics for 630 streamgages were combined to form the final database used in the regional regression analysis. Five hydrologic regions were developed for the area of California outside of the desert region. The final regional regression equations are functions of drainage area and mean annual precipitation for four of the five regions. In one region, the Sierra Nevada region, the final equations are functions of drainage area, mean basin elevation, and mean annual precipitation. Average standard errors of prediction for the regression equations in all five regions range from 42.7 to 161.9 percent. For the desert region of California, an analysis of 33 streamgages was used to develop regional estimates of all three parameters (mean, standard deviation, and skew) of the log-Pearson Type III distribution. The regional estimates were then used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins. The final regional regression equations are functions of drainage area. Average standard errors of prediction for these regression equations range from 214.2 to 856.2 percent. Annual peak-flow data through water year 2006 were analyzed for eight streamgages in California having 10 or more years of data considered to be affected by urbanization. Flood-frequency estimates were computed for the urban streamgages by fitting a Pearson Type III distribution to logarithms of annual peak flows for each streamgage. Regression analysis could not be used to develop flood-frequency estimation equations for urban streams because of the limited number of sites. Flood-frequency estimates for the eight urban sites were graphically compared to flood-frequency estimates for 630 non-urban sites. The regression equations developed from this study will be incorporated into the U.S. Geological Survey (USGS) StreamStats program. The StreamStats program is a Web-based application that provides streamflow statistics and basin characteristics for USGS streamgages and ungaged sites of interest. StreamStats can also compute basin characteristics and provide estimates of streamflow statistics for ungaged sites when users select the location of a site along any stream in California.

  2. Working covariance model selection for generalized estimating equations.

    PubMed

    Carey, Vincent J; Wang, You-Gan

    2011-11-20

    We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice. Copyright © 2011 John Wiley & Sons, Ltd.

  3. Delineation and Analysis of Uncertainty of Contributing Areas to Wells at the Southbury Training School, Southbury, Connecticut

    USGS Publications Warehouse

    Starn, J. Jeffrey; Stone, Janet Radway; Mullaney, John R.

    2000-01-01

    Contributing areas to public-supply wells at the Southbury Training School in Southbury, Connecticut, were mapped by simulating ground-water flow in stratified glacial deposits in the lower Transylvania Brook watershed. The simulation used nonlinear regression methods and informational statistics to estimate parameters of a ground-water flow model using drawdown data from an aquifer test. The goodness of fit of the model and the uncertainty associated with model predictions were statistically measured. A watershed-scale model, depicting large-scale ground-water flow in the Transylvania Brook watershed, was used to estimate the distribution of groundwater recharge. Estimates of recharge from 10 small basins in the watershed differed on the basis of the drainage characteristics of each basin. Small basins having well-defined stream channels contributed less ground-water recharge than basins having no defined channels because potential ground-water recharge was carried away in the stream channel. Estimates of ground-water recharge were used in an aquifer-scale parameter-estimation model. Seven variations of the ground-water-flow system were posed, each representing the ground-water-flow system in slightly different but realistic ways. The model that most closely reproduced measured hydraulic heads and flows with realistic parameter values was selected as the most representative of the ground-water-flow system and was used to delineate boundaries of the contributing areas. The model fit revealed no systematic model error, which indicates that the model is likely to represent the major characteristics of the actual system. Parameter values estimated during the simulation are as follows: horizontal hydraulic conductivity of coarse-grained deposits, 154 feet per day; vertical hydraulic conductivity of coarse-grained deposits, 0.83 feet per day; horizontal hydraulic conductivity of fine-grained deposits, 29 feet per day; specific yield, 0.007; specific storage, 1.6E-05. Average annual recharge was estimated using the watershed-scale model with no parameter estimation and was determined to be 24 inches per year in the valley areas and 9 inches per year in the upland areas. The parameter estimates produced in the model are similar to expected values, with two exceptions. The estimated specific yield of the stratified glacial deposits is lower than expected, which could be caused by the layered nature of the deposits. The recharge estimate produced by the model was also lower?about 32 percent of the average annual rate. This could be caused by the timing of the aquifer test with respect to the annual cycle of ground-water recharge, and by some of the expected recharge going to parts of the flow system that were not simulated. The data used in the calibration were collected during an aquifer test from October 30 to November 4, 1996. The model fit was very good, as indicated by the correlation coefficient (0.999) between the weighted simulated values and weighted observed values. The model also reproduced the general rise in ground-water levels caused by ground-water recharge and the cyclic fluctuations caused by pumping prior to the aquifer test. Contributing areas were delineated using a particle-tracking procedure. Hypothetical particles of water were introduced at each model cell in the top layer and were tracked to determine whether or not they reached the pumped well. A deterministic contributing area was calculated using the calibrated model, and a probabilistic contributing area was calculated using a Monte Carlo approach along with the calibrated model. The Monte Carlo simulation was done, using the parameter variance/covariance matrix generated by the regression model, to estimate probabilities associated with the contributing area to the wells. The probabilities arise from uncertainty in the estimated parameter values, which in turn arise from the adequacy of the data available to comprehensively describe the groundwater-flow sy

  4. Visual evaluation of kinetic characteristics of PET probe for neuroreceptors using a two-phase graphic plot analysis.

    PubMed

    Ito, Hiroshi; Ikoma, Yoko; Seki, Chie; Kimura, Yasuyuki; Kawaguchi, Hiroshi; Takuwa, Hiroyuki; Ichise, Masanori; Suhara, Tetsuya; Kanno, Iwao

    2017-05-01

    Objectives In PET studies for neuroreceptors, tracer kinetics are described by the two-tissue compartment model (2-TCM), and binding parameters, including the total distribution volume (V T ), non-displaceable distribution volume (V ND ), and binding potential (BP ND ), can be determined from model parameters estimated by kinetic analysis. The stability of binding parameter estimates depends on the kinetic characteristics of radioligands. To describe these kinetic characteristics, we previously developed a two-phase graphic plot analysis in which V ND and V T can be estimated from the x-intercept of regression lines for early and delayed phases, respectively. In this study, we applied this graphic plot analysis to visual evaluation of the kinetic characteristics of radioligands for neuroreceptors, and investigated a relationship between the shape of these graphic plots and the stability of binding parameters estimated by the kinetic analysis with 2-TCM in simulated brain tissue time-activity curves (TACs) with various binding parameters. Methods 90-min TACs were generated with the arterial input function and assumed kinetic parameters according to 2-TCM. Graphic plot analysis was applied to these simulated TACs, and the curvature of the plot for each TAC was evaluated visually. TACs with several noise levels were also generated with various kinetic parameters, and the bias and variation of binding parameters estimated by kinetic analysis were calculated in each TAC. These bias and variation were compared with the shape of graphic plots. Results The graphic plots showed larger curvature for TACs with higher specific binding and slower dissociation of specific binding. The quartile deviations of V ND and BP ND determined by kinetic analysis were smaller for radioligands with slow dissociation. Conclusions The larger curvature of graphic plots for radioligands with slow dissociation might indicate a stable determination of V ND and BP ND by kinetic analysis. For investigation of the kinetics of radioligands, such kinetic characteristics should be considered.

  5. An EM-based semi-parametric mixture model approach to the regression analysis of competing-risks data.

    PubMed

    Ng, S K; McLachlan, G J

    2003-04-15

    We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright 2003 John Wiley & Sons, Ltd.

  6. An Efficient Design Strategy for Logistic Regression Using Outcome- and Covariate-Dependent Pooling of Biospecimens Prior to Assay

    PubMed Central

    Lyles, Robert H.; Mitchell, Emily M.; Weinberg, Clarice R.; Umbach, David M.; Schisterman, Enrique F.

    2016-01-01

    Summary Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718–726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call “(y,c)-pooling,” which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure. PMID:26964741

  7. Regression analysis of mixed recurrent-event and panel-count data.

    PubMed

    Zhu, Liang; Tong, Xinwei; Sun, Jianguo; Chen, Manhua; Srivastava, Deo Kumar; Leisenring, Wendy; Robison, Leslie L

    2014-07-01

    In event history studies concerning recurrent events, two types of data have been extensively discussed. One is recurrent-event data (Cook and Lawless, 2007. The Analysis of Recurrent Event Data. New York: Springer), and the other is panel-count data (Zhao and others, 2010. Nonparametric inference based on panel-count data. Test 20: , 1-42). In the former case, all study subjects are monitored continuously; thus, complete information is available for the underlying recurrent-event processes of interest. In the latter case, study subjects are monitored periodically; thus, only incomplete information is available for the processes of interest. In reality, however, a third type of data could occur in which some study subjects are monitored continuously, but others are monitored periodically. When this occurs, we have mixed recurrent-event and panel-count data. This paper discusses regression analysis of such mixed data and presents two estimation procedures for the problem. One is a maximum likelihood estimation procedure, and the other is an estimating equation procedure. The asymptotic properties of both resulting estimators of regression parameters are established. Also, the methods are applied to a set of mixed recurrent-event and panel-count data that arose from a Childhood Cancer Survivor Study and motivated this investigation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Crown area equations for 13 species of trees and shrubs in northern California and southwestern Oregon

    Treesearch

    Fabian C.C. Uzoh; Martin W. Ritchie

    1996-01-01

    The equations presented predict crown area for 13 species of trees and shrubs which may be found growing in competition with commercial conifers during early stages of stand development. The equations express crown area as a function of basal area and height. Parameters were estimated for each species individually using weighted nonlinear least square regression.

  9. Prediction of anthropometric measurements from tooth length--A Dravidian study.

    PubMed

    Sunitha, J; Ananthalakshmi, R; Sathiya, Jeeva J; Nadeem, Jeddy; Dhanarathnam, Shanmugam

    2015-12-01

    Anthropometric measurement is essential for identification of both victims and suspects. Often, this data is not readily available in a crime scene situation. The availability of one data set should help in predicting the other. This study was hypothesised on the basis of a correlation and geometry between the tooth length and various body measurements. To correlate face, palm, foot and stature measurements with tooth length. To derive a regression formula to estimate the various measurements from tooth length. The present study was conducted on Dravidian dental students in the age group 18 - 25 with a sample size of 372. All of the dental and physical parameters were measured using standard anthropometric equipments and techniques. The data was analysed using SPSS software and the methods used for statistical analysis were linear regression analysis and Pearson correlation. The parameters (incisor height (IH), face height (FH), palm length (PL), foot length (FL) and stature (S) showed nil to mild correlation (R = 0.2 ≤ 0.4) except for palm length (PL) and foot length (FL). (R>0.6). It is concluded that odontometric data is not a reliable source for estimating the face height (FH), palm length (PL), foot length (FL) and stature (S).

  10. A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

    PubMed Central

    Jacob, Benjamin G; Griffith, Daniel A; Muturi, Ephantus J; Caamano, Erick X; Githure, John I; Novak, Robert J

    2009-01-01

    Background Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity. PMID:19772590

  11. Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

    PubMed

    Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

    2016-09-01

    Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.

  12. Screening-level models to estimate partition ratios of organic chemicals between polymeric materials, air and water.

    PubMed

    Reppas-Chrysovitsinos, Efstathios; Sobek, Anna; MacLeod, Matthew

    2016-06-15

    Polymeric materials flowing through the technosphere are repositories of organic chemicals throughout their life cycle. Equilibrium partition ratios of organic chemicals between these materials and air (KMA) or water (KMW) are required for models of fate and transport, high-throughput exposure assessment and passive sampling. KMA and KMW have been measured for a growing number of chemical/material combinations, but significant data gaps still exist. We assembled a database of 363 KMA and 910 KMW measurements for 446 individual compounds and nearly 40 individual polymers and biopolymers, collected from 29 studies. We used the EPI Suite and ABSOLV software packages to estimate physicochemical properties of the compounds and we employed an empirical correlation based on Trouton's rule to adjust the measured KMA and KMW values to a standard reference temperature of 298 K. Then, we used a thermodynamic triangle with Henry's law constant to calculate a complete set of 1273 KMA and KMW values. Using simple linear regression, we developed a suite of single parameter linear free energy relationship (spLFER) models to estimate KMA from the EPI Suite-estimated octanol-air partition ratio (KOA) and KMW from the EPI Suite-estimated octanol-water (KOW) partition ratio. Similarly, using multiple linear regression, we developed a set of polyparameter linear free energy relationship (ppLFER) models to estimate KMA and KMW from ABSOLV-estimated Abraham solvation parameters. We explored the two LFER approaches to investigate (1) their performance in estimating partition ratios, and (2) uncertainties associated with treating all different polymers as a single "bulk" polymeric material compartment. The models we have developed are suitable for screening assessments of the tendency for organic chemicals to be emitted from materials, and for use in multimedia models of the fate of organic chemicals in the indoor environment. In screening applications we recommend that KMA and KMW be modeled as 0.06 ×KOA and 0.06 ×KOW respectively, with an uncertainty range of a factor of 15.

  13. Retinal blood vessel segmentation in high resolution fundus photographs using automated feature parameter estimation

    NASA Astrophysics Data System (ADS)

    Orlando, José Ignacio; Fracchia, Marcos; del Río, Valeria; del Fresno, Mariana

    2017-11-01

    Several ophthalmological and systemic diseases are manifested through pathological changes in the properties and the distribution of the retinal blood vessels. The characterization of such alterations requires the segmentation of the vasculature, which is a tedious and time-consuming task that is infeasible to be performed manually. Numerous attempts have been made to propose automated methods for segmenting the retinal vasculature from fundus photographs, although their application in real clinical scenarios is usually limited by their ability to deal with images taken at different resolutions. This is likely due to the large number of parameters that have to be properly calibrated according to each image scale. In this paper we propose to apply a novel strategy for automated feature parameter estimation, combined with a vessel segmentation method based on fully connected conditional random fields. The estimation model is learned by linear regression from structural properties of the images and known optimal configurations, that were previously obtained for low resolution data sets. Our experiments in high resolution images show that this approach is able to estimate appropriate configurations that are suitable for performing the segmentation task without requiring to re-engineer parameters. Furthermore, our combined approach reported state of the art performance on the benchmark data set HRF, as measured in terms of the F1-score and the Matthews correlation coefficient.

  14. The association between quality of care and quality of life in long-stay nursing home residents with preserved cognition.

    PubMed

    Kim, Sun Jung; Park, Eun-Cheol; Kim, Sulgi; Nakagawa, Shunichi; Lung, John; Choi, Jong Bum; Ryu, Woo Sang; Min, Too Jae; Shin, Hyun Phil; Kim, Kyudam; Yoo, Ji Won

    2014-03-01

    To assess the overall quality of life of long-stay nursing home residents with preserved cognition, to examine whether the Centers for Medicare and Medicaid Service's Nursing Home Compare 5-star quality rating system reflects the overall quality of life of such residents, and to examine whether residents' demographics and clinical characteristics affect their quality of life. Quality of life was measured using the Participant Outcomes and Status Measures-Nursing Facility survey, which has 10 sections and 63 items. Total scores range from 20 (lowest possible quality of life) to 100 (highest). Long-stay nursing home residents with preserved cognition (n = 316) were interviewed. The average quality- of-life score was 71.4 (SD: 7.6; range: 45.1-93.0). Multilevel regression models revealed that quality of life was associated with physical impairment (parameter estimate = -0.728; P = .04) and depression (parameter estimate = -3.015; P = .01) but not Nursing Home Compare's overall star rating (parameter estimate = 0.683; P = .12) and not pain (parameter estimate = -0.705; P = .47). The 5-star quality rating system did not reflect the quality of life of long-stay nursing home residents with preserved cognition. Notably, pain was not associated with quality of life, but physical impairment and depression were. Copyright © 2014 American Medical Directors Association, Inc. Published by Elsevier Inc. All rights reserved.

  15. An empirical model for dissolution profile and its application to floating dosage forms.

    PubMed

    Weiss, Michael; Kriangkrai, Worawut; Sungthongjeen, Srisagul

    2014-06-02

    A sum of two inverse Gaussian functions is proposed as a highly flexible empirical model for fitting of in vitro dissolution profiles. The model was applied to quantitatively describe theophylline release from effervescent multi-layer coated floating tablets containing different amounts of the anti-tacking agents talc or glyceryl monostearate. Model parameters were estimated by nonlinear regression (mixed-effects modeling). The estimated parameters were used to determine the mean dissolution time, as well as to reconstruct the time course of release rate for each formulation, whereby the fractional release rate can serve as a diagnostic tool for classification of dissolution processes. The approach allows quantification of dissolution behavior and could provide additional insights into the underlying processes. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Space Shuttle Main Engine performance analysis

    NASA Technical Reports Server (NTRS)

    Santi, L. Michael

    1993-01-01

    For a number of years, NASA has relied primarily upon periodically updated versions of Rocketdyne's power balance model (PBM) to provide space shuttle main engine (SSME) steady-state performance prediction. A recent computational study indicated that PBM predictions do not satisfy fundamental energy conservation principles. More recently, SSME test results provided by the Technology Test Bed (TTB) program have indicated significant discrepancies between PBM flow and temperature predictions and TTB observations. Results of these investigations have diminished confidence in the predictions provided by PBM, and motivated the development of new computational tools for supporting SSME performance analysis. A multivariate least squares regression algorithm was developed and implemented during this effort in order to efficiently characterize TTB data. This procedure, called the 'gains model,' was used to approximate the variation of SSME performance parameters such as flow rate, pressure, temperature, speed, and assorted hardware characteristics in terms of six assumed independent influences. These six influences were engine power level, mixture ratio, fuel inlet pressure and temperature, and oxidizer inlet pressure and temperature. A BFGS optimization algorithm provided the base procedure for determining regression coefficients for both linear and full quadratic approximations of parameter variation. Statistical information relative to data deviation from regression derived relations was also computed. A new strategy for integrating test data with theoretical performance prediction was also investigated. The current integration procedure employed by PBM treats test data as pristine and adjusts hardware characteristics in a heuristic manner to achieve engine balance. Within PBM, this integration procedure is called 'data reduction.' By contrast, the new data integration procedure, termed 'reconciliation,' uses mathematical optimization techniques, and requires both measurement and balance uncertainty estimates. The reconciler attempts to select operational parameters that minimize the difference between theoretical prediction and observation. Selected values are further constrained to fall within measurement uncertainty limits and to satisfy fundamental physical relations (mass conservation, energy conservation, pressure drop relations, etc.) within uncertainty estimates for all SSME subsystems. The parameter selection problem described above is a traditional nonlinear programming problem. The reconciler employs a mixed penalty method to determine optimum values of SSME operating parameters associated with this problem formulation.

  17. Hybrid Simulation Modeling to Estimate U.S. Energy Elasticities

    NASA Astrophysics Data System (ADS)

    Baylin-Stern, Adam C.

    This paper demonstrates how an U.S. application of CIMS, a technologically explicit and behaviourally realistic energy-economy simulation model which includes macro-economic feedbacks, can be used to derive estimates of elasticity of substitution (ESUB) and autonomous energy efficiency index (AEEI) parameters. The ability of economies to reduce greenhouse gas emissions depends on the potential for households and industry to decrease overall energy usage, and move from higher to lower emissions fuels. Energy economists commonly refer to ESUB estimates to understand the degree of responsiveness of various sectors of an economy, and use estimates to inform computable general equilibrium models used to study climate policies. Using CIMS, I have generated a set of future, 'pseudo-data' based on a series of simulations in which I vary energy and capital input prices over a wide range. I then used this data set to estimate the parameters for transcendental logarithmic production functions using regression techniques. From the production function parameter estimates, I calculated an array of elasticity of substitution values between input pairs. Additionally, this paper demonstrates how CIMS can be used to calculate price-independent changes in energy-efficiency in the form of the AEEI, by comparing energy consumption between technologically frozen and 'business as usual' simulations. The paper concludes with some ideas for model and methodological improvement, and how these might figure into future work in the estimation of ESUBs from CIMS. Keywords: Elasticity of substitution; hybrid energy-economy model; translog; autonomous energy efficiency index; rebound effect; fuel switching.

  18. Minimization of model representativity errors in identification of point source emission from atmospheric concentration measurements

    NASA Astrophysics Data System (ADS)

    Sharan, Maithili; Singh, Amit Kumar; Singh, Sarvesh Kumar

    2017-11-01

    Estimation of an unknown atmospheric release from a finite set of concentration measurements is considered an ill-posed inverse problem. Besides ill-posedness, the estimation process is influenced by the instrumental errors in the measured concentrations and model representativity errors. The study highlights the effect of minimizing model representativity errors on the source estimation. This is described in an adjoint modelling framework and followed in three steps. First, an estimation of point source parameters (location and intensity) is carried out using an inversion technique. Second, a linear regression relationship is established between the measured concentrations and corresponding predicted using the retrieved source parameters. Third, this relationship is utilized to modify the adjoint functions. Further, source estimation is carried out using these modified adjoint functions to analyse the effect of such modifications. The process is tested for two well known inversion techniques, called renormalization and least-square. The proposed methodology and inversion techniques are evaluated for a real scenario by using concentrations measurements from the Idaho diffusion experiment in low wind stable conditions. With both the inversion techniques, a significant improvement is observed in the retrieval of source estimation after minimizing the representativity errors.

  19. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  20. A comparison of time dependent Cox regression, pooled logistic regression and cross sectional pooling with simulations and an application to the Framingham Heart Study.

    PubMed

    Ngwa, Julius S; Cabral, Howard J; Cheng, Debbie M; Pencina, Michael J; Gagnon, David R; LaValley, Michael P; Cupples, L Adrienne

    2016-11-03

    Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.

  1. Mathematical model of optical signals emitted by electrical discharges occuring in electroinsulating oil

    NASA Astrophysics Data System (ADS)

    Kozioł, Michał

    2017-10-01

    The article presents a parametric model describing the registered distributions spectrum of optical radiation emitted by electrical discharges generated in the systems: the needle- needle, the needleplate and in the system for surface discharges. Generation of electrical discharges and registration of the emitted radiation was carried out in three different electrical insulating oils: fabric new, operated (used) and operated with air bubbles. For registration of optical spectra in the range of ultraviolet, visible and near infrared a high resolution spectrophotometer was. The proposed mathematical model was developed in a regression procedure using gauss-sigmoid type function. The dependent variable was the intensity of the recorded optical signals. In order to estimate the optimal parameters of the model an evolutionary algorithm was used. The optimization procedure was performed in Matlab environment. For determination of the matching quality of theoretical parameters of the regression function to the empirical data determination coefficient R2 was applied.

  2. Analytical and regression models of glass rod drawing process

    NASA Astrophysics Data System (ADS)

    Alekseeva, L. B.

    2018-03-01

    The process of drawing glass rods (light guides) is being studied. The parameters of the process affecting the quality of the light guide have been determined. To solve the problem, mathematical models based on general equations of continuum mechanics are used. The conditions for the stable flow of the drawing process have been found, which are determined by the stability of the motion of the glass mass in the formation zone to small uncontrolled perturbations. The sensitivity of the formation zone to perturbations of the drawing speed and viscosity is estimated. Experimental models of the drawing process, based on the regression analysis methods, have been obtained. These models make it possible to customize a specific production process to obtain light guides of the required quality. They allow one to find the optimum combination of process parameters in the chosen area and to determine the required accuracy of maintaining them at a specified level.

  3. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  4. Optical Method for Estimating the Chlorophyll Contents in Plant Leaves.

    PubMed

    Pérez-Patricio, Madaín; Camas-Anzueto, Jorge Luis; Sanchez-Alegría, Avisaí; Aguilar-González, Abiel; Gutiérrez-Miceli, Federico; Escobar-Gómez, Elías; Voisin, Yvon; Rios-Rojas, Carlos; Grajales-Coutiño, Ruben

    2018-02-22

    This work introduces a new vision-based approach for estimating chlorophyll contents in a plant leaf using reflectance and transmittance as base parameters. Images of the top and underside of the leaf are captured. To estimate the base parameters (reflectance/transmittance), a novel optical arrangement is proposed. The chlorophyll content is then estimated by using linear regression where the inputs are the reflectance and transmittance of the leaf. Performance of the proposed method for chlorophyll content estimation was compared with a spectrophotometer and a Soil Plant Analysis Development (SPAD) meter. Chlorophyll content estimation was realized for Lactuca sativa L., Azadirachta indica , Canavalia ensiforme , and Lycopersicon esculentum . Experimental results showed that-in terms of accuracy and processing speed-the proposed algorithm outperformed many of the previous vision-based approach methods that have used SPAD as a reference device. On the other hand, the accuracy reached is 91% for crops such as Azadirachta indica , where the chlorophyll value was obtained using the spectrophotometer. Additionally, it was possible to achieve an estimation of the chlorophyll content in the leaf every 200 ms with a low-cost camera and a simple optical arrangement. This non-destructive method increased accuracy in the chlorophyll content estimation by using an optical arrangement that yielded both the reflectance and transmittance information, while the required hardware is cheap.

  5. Simplified large African carnivore density estimators from track indices.

    PubMed

    Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J

    2016-01-01

    The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y  =  αx  + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P  > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P  < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.

  6. Calibrating the Decline Rate - Peak Luminosity Relation for Type Ia Supernovae

    NASA Astrophysics Data System (ADS)

    Rust, Bert W.; Pruzhinskaya, Maria V.; Thijsse, Barend J.

    2015-08-01

    The correlation between peak luminosity and rate of decline in luminosity for Type I supernovae was first studied by B. W. Rust [Ph.D. thesis, Univ. of Illinois (1974) ORNL-4953] and Yu. P. Pskovskii [Sov. Astron., 21 (1977) 675] in the 1970s. Their work was little-noted until Phillips rediscovered the correlation in 1993 [ApJ, 413 (1993) L105] and attempted to derive a calibration relation using a difference quotient approximation Δm15(B) to the decline rate after peak luminosity Mmax(B). Numerical differentiation of data containing measuring errors is a notoriously unstable calculation, but Δm15(B) remains the parameter of choice for most calibration methods developed since 1993. To succeed, it should be computed from good functional fits to the lightcurves, but most workers never exhibit their fits. In the few instances where they have, the fits are not very good. Some of the 9 supernovae in the Phillips study required extinction corrections in their estimates of the Mmax(B), and so were not appropriate for establishing a calibration relation. Although the relative uncertainties in his Δm15(B) estimates were comparable to those in his Mmax(B) estimates, he nevertheless used simple linear regression of the latter on the former, rather than major-axis regression (total least squares) which would have been more appropriate.Here we determine some new calibration relations using a sample of nearby "pure" supernovae suggested by M. V. Pruzhinskaya [Astron. Lett., 37 (2011) 663]. Their parent galaxies are all in the NED collection, with good distance estimates obtained by several different methods. We fit each lightcurve with an optimal regression spline obtained by B. J. Thijsse's spline2 [Comp. in Sci. & Eng., 10 (2008) 49]. The fits, which explain more that 99% of the variance in each case, are better than anything heretofore obtained by stretching "template" lightcurves or fitting combinations of standard lightcurves. We use the fits to compute estimates of Δm15(B) and some other calibration parameters suggested by Pskovskii [Sov. Astron., 28 (1984) 858] and compare their utility for cosmological testing.

  7. CEval: All-in-one software for data processing and statistical evaluations in affinity capillary electrophoresis.

    PubMed

    Dubský, Pavel; Ördögová, Magda; Malý, Michal; Riesová, Martina

    2016-05-06

    We introduce CEval software (downloadable for free at echmet.natur.cuni.cz) that was developed for quicker and easier electrophoregram evaluation and further data processing in (affinity) capillary electrophoresis. This software allows for automatic peak detection and evaluation of common peak parameters, such as its migration time, area, width etc. Additionally, the software includes a nonlinear regression engine that performs peak fitting with the Haarhoff-van der Linde (HVL) function, including automated initial guess of the HVL function parameters. HVL is a fundamental peak-shape function in electrophoresis, based on which the correct effective mobility of the analyte represented by the peak is evaluated. Effective mobilities of an analyte at various concentrations of a selector can be further stored and plotted in an affinity CE mode. Consequently, the mobility of the free analyte, μA, mobility of the analyte-selector complex, μAS, and the apparent complexation constant, K('), are first guessed automatically from the linearized data plots and subsequently estimated by the means of nonlinear regression. An option that allows two complexation dependencies to be fitted at once is especially convenient for enantioseparations. Statistical processing of these data is also included, which allowed us to: i) express the 95% confidence intervals for the μA, μAS and K(') least-squares estimates, ii) do hypothesis testing on the estimated parameters for the first time. We demonstrate the benefits of the CEval software by inspecting complexation of tryptophan methyl ester with two cyclodextrins, neutral heptakis(2,6-di-O-methyl)-β-CD and charged heptakis(6-O-sulfo)-β-CD. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Large signal-to-noise ratio quantification in MLE for ARARMAX models

    NASA Astrophysics Data System (ADS)

    Zou, Yiqun; Tang, Xiafei

    2014-06-01

    It has been shown that closed-loop linear system identification by indirect method can be generally transferred to open-loop ARARMAX (AutoRegressive AutoRegressive Moving Average with eXogenous input) estimation. For such models, the gradient-related optimisation with large enough signal-to-noise ratio (SNR) can avoid the potential local convergence in maximum likelihood estimation. To ease the application of this condition, the threshold SNR needs to be quantified. In this paper, we build the amplitude coefficient which is an equivalence to the SNR and prove the finiteness of the threshold amplitude coefficient within the stability region. The quantification of threshold is achieved by the minimisation of an elaborately designed multi-variable cost function which unifies all the restrictions on the amplitude coefficient. The corresponding algorithm based on two sets of physically realisable system input-output data details the minimisation and also points out how to use the gradient-related method to estimate ARARMAX parameters when local minimum is present as the SNR is small. Then, the algorithm is tested on a theoretical AutoRegressive Moving Average with eXogenous input model for the derivation of the threshold and a gas turbine engine real system for model identification, respectively. Finally, the graphical validation of threshold on a two-dimensional plot is discussed.

  9. Random regression analyses using B-splines to model growth of Australian Angus cattle

    PubMed Central

    Meyer, Karin

    2005-01-01

    Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011

  10. Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

    PubMed

    Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

    2012-07-27

    A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.

  11. Regression analysis of mixed panel count data with dependent terminal events.

    PubMed

    Yu, Guanglei; Zhu, Liang; Li, Yang; Sun, Jianguo; Robison, Leslie L

    2017-05-10

    Event history studies are commonly conducted in many fields, and a great deal of literature has been established for the analysis of the two types of data commonly arising from these studies: recurrent event data and panel count data. The former arises if all study subjects are followed continuously, while the latter means that each study subject is observed only at discrete time points. In reality, a third type of data, a mixture of the two types of the data earlier, may occur and furthermore, as with the first two types of the data, there may exist a dependent terminal event, which may preclude the occurrences of recurrent events of interest. This paper discusses regression analysis of mixed recurrent event and panel count data in the presence of a terminal event and an estimating equation-based approach is proposed for estimation of regression parameters of interest. In addition, the asymptotic properties of the proposed estimator are established, and a simulation study conducted to assess the finite-sample performance of the proposed method suggests that it works well in practical situations. Finally, the methodology is applied to a childhood cancer study that motivated this study. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Estimating soil temperature using neighboring station data via multi-nonlinear regression and artificial neural network models.

    PubMed

    Bilgili, Mehmet; Sahin, Besir; Sangun, Levent

    2013-01-01

    The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.

  13. Preliminary study of the association between the elimination parameters of phenytoin and phenobarbital.

    PubMed

    Methaneethorn, Janthima; Panomvana, Duangchit; Vachirayonstien, Thaveechai

    2017-09-26

    Therapeutic drug monitoring is essential for both phenytoin and phenobarbital therapy given their narrow therapeutic indexes. Nevertheless, the measurement of either phenytoin or phenobarbital concentrations might not be available in some rural hospitals. Information assisting individualized phenytoin and phenobarbital combination therapy is important. This study's objective was to determine the relationship between the maximum rate of metabolism of phenytoin (Vmax) and phenobarbital clearance (CLPB), which can serve as a guide to individualized drug therapy. Data on phenytoin and phenobarbital concentrations of 19 epileptic patients concurrently receiving both drugs were obtained from medical records. Phenytoin and phenobarbital pharmacokinetic parameters were studied at steady-state conditions. The relationship between the elimination parameters of both drugs was determined using simple linear regression. A high correlation coefficient between Vmax and CLPB was found [r=0.744; p<0.001 for Vmax (mg/kg/day) vs. CLPB (L/kg/day)]. Such a relatively strong linear relationship between the elimination parameters of both drugs indicates that Vmax might be predicted from CLPB and vice versa. Regression equations were established for estimating Vmax from CLPB, and vice versa in patients treated with combination of phenytoin and phenobarbital. These proposed equations can be of use in aiding individualized drug therapy.

  14. Regression analysis of longitudinal data with correlated censoring and observation times.

    PubMed

    Li, Yang; He, Xin; Wang, Haiying; Sun, Jianguo

    2016-07-01

    Longitudinal data occur in many fields such as the medical follow-up studies that involve repeated measurements. For their analysis, most existing approaches assume that the observation or follow-up times are independent of the response process either completely or given some covariates. In practice, it is apparent that this may not be true. In this paper, we present a joint analysis approach that allows the possible mutual correlations that can be characterized by time-dependent random effects. Estimating equations are developed for the parameter estimation and the resulted estimators are shown to be consistent and asymptotically normal. The finite sample performance of the proposed estimators is assessed through a simulation study and an illustrative example from a skin cancer study is provided.

  15. Application of nonlinear-regression methods to a ground-water flow model of the Albuquerque Basin, New Mexico

    USGS Publications Warehouse

    Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.

    1998-01-01

    This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface flow from adjacent regions; irrigation and septic field seepage; and leakage through the Rio Grande, canal, and Cochiti Reservoir beds. Ground water is discharged from the basin by withdrawal; evapotranspiration; subsurface flow; and flow to the Rio Grande, canals, and drains. The transient, three-dimensional numerical model of ground-water flow to which nonlinear-regression methods were applied simulates flow in the Albuquerque Basin from 1900 to March 1995. Six different basin subsurface configurations are considered in the model. These configurations are designed to test the effects of (1) varying the simulated basin thickness, (2) including a hypothesized hydrogeologic unit with large hydraulic conductivity in the western part of the basin (the west basin high-K zone), and (3) substantially lowering the simulated hydraulic conductivity of a fault in the western part of the basin (the low-K fault zone). The model with each of the subsurface configurations was calibrated using a nonlinear least- squares regression technique. The calibration data set includes 802 hydraulic-head measurements that provide broad spatial and temporal coverage of basin conditions, and one measurement of net flow from the Rio Grande and drains to the ground-water system in the Albuquerque area. Data are weighted on the basis of estimates of the standard deviations of measurement errors. The 10 to 12 parameters to which the calibration data as a whole are generally most sensitive were estimated by nonlinear regression, whereas the remaining model parameter values were specified. Results of model calibration indicate that the optimal parameter estimates as a whole are most reasonable in calibrations of the model with with configurations 3 (which contains 1,600-ft-thick basin deposits and the west basin high-K zone), 4 (which contains 5,000-ft-thick basin de

  16. A non-linear regression method for CT brain perfusion analysis

    NASA Astrophysics Data System (ADS)

    Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.

    2015-03-01

    CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.

  17. Estimating suspended sediment load with multivariate adaptive regression spline, teaching-learning based optimization, and artificial bee colony models.

    PubMed

    Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat

    2018-05-23

    The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.

    PubMed

    Liu, Han; Wang, Lie; Zhao, Tuo

    2015-08-01

    We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.

  19. One-Dimensional Transport with Inflow and Storage (OTIS): A Solute Transport Model for Streams and Rivers

    USGS Publications Warehouse

    Runkel, Robert L.

    1998-01-01

    OTIS is a mathematical simulation model used to characterize the fate and transport of water-borne solutes in streams and rivers. The governing equation underlying the model is the advection-dispersion equation with additional terms to account for transient storage, lateral inflow, first-order decay, and sorption. This equation and the associated equations describing transient storage and sorption are solved using a Crank-Nicolson finite-difference solution. OTIS may be used in conjunction with data from field-scale tracer experiments to quantify the hydrologic parameters affecting solute transport. This application typically involves a trial-and-error approach wherein parameter estimates are adjusted to obtain an acceptable match between simulated and observed tracer concentrations. Additional applications include analyses of nonconservative solutes that are subject to sorption processes or first-order decay. OTIS-P, a modified version of OTIS, couples the solution of the governing equation with a nonlinear regression package. OTIS-P determines an optimal set of parameter estimates that minimize the squared differences between the simulated and observed concentrations, thereby automating the parameter estimation process. This report details the development and application of OTIS and OTIS-P. Sections of the report describe model theory, input/output specifications, sample applications, and installation instructions.

  20. Retrieving Temperature Anomaly in the Global Subsurface and Deeper Ocean From Satellite Observations

    NASA Astrophysics Data System (ADS)

    Su, Hua; Li, Wene; Yan, Xiao-Hai

    2018-01-01

    Retrieving the subsurface and deeper ocean (SDO) dynamic parameters from satellite observations is crucial for effectively understanding ocean interior anomalies and dynamic processes, but it is challenging to accurately estimate the subsurface thermal structure over the global scale from sea surface parameters. This study proposes a new approach based on Random Forest (RF) machine learning to retrieve subsurface temperature anomaly (STA) in the global ocean from multisource satellite observations including sea surface height anomaly (SSHA), sea surface temperature anomaly (SSTA), sea surface salinity anomaly (SSSA), and sea surface wind anomaly (SSWA) via in situ Argo data for RF training and testing. RF machine-learning approach can accurately retrieve the STA in the global ocean from satellite observations of sea surface parameters (SSHA, SSTA, SSSA, SSWA). The Argo STA data were used to validate the accuracy and reliability of the results from the RF model. The results indicated that SSHA, SSTA, SSSA, and SSWA together are useful parameters for detecting SDO thermal information and obtaining accurate STA estimations. The proposed method also outperformed support vector regression (SVR) in global STA estimation. It will be a useful technique for studying SDO thermal variability and its role in global climate system from global-scale satellite observations.

Top