NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Kaneko, Hiromasa; Funatsu, Kimito
2013-09-23
We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Demidenko, Eugene
2017-09-01
The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
An Excel Solver Exercise to Introduce Nonlinear Regression
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2013-01-01
Business students taking business analytics courses that have significant predictive modeling components, such as marketing research, data mining, forecasting, and advanced financial modeling, are introduced to nonlinear regression using application software that is a "black box" to the students. Thus, although correct models are…
Wu, Lingtao; Lord, Dominique
2017-05-01
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Yobbi, D.K.
2000-01-01
A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
USDA-ARS?s Scientific Manuscript database
Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...
A controlled experiment in ground water flow model calibration
Hill, M.C.; Cooley, R.L.; Pollock, D.W.
1998-01-01
Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Bin; Research Center of Applied Statistics, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013; Lin, Boqiang, E-mail: bqlin@xmu.edu.cn
China is currently the world's largest carbon dioxide (CO{sub 2}) emitter. Moreover, total energy consumption and CO{sub 2} emissions in China will continue to increase due to the rapid growth of industrialization and urbanization. Therefore, vigorously developing the high–tech industry becomes an inevitable choice to reduce CO{sub 2} emissions at the moment or in the future. However, ignoring the existing nonlinear links between economic variables, most scholars use traditional linear models to explore the impact of the high–tech industry on CO{sub 2} emissions from an aggregate perspective. Few studies have focused on nonlinear relationships and regional differences in China. Basedmore » on panel data of 1998–2014, this study uses the nonparametric additive regression model to explore the nonlinear effect of the high–tech industry from a regional perspective. The estimated results show that the residual sum of squares (SSR) of the nonparametric additive regression model in the eastern, central and western regions are 0.693, 0.054 and 0.085 respectively, which are much less those that of the traditional linear regression model (3.158, 4.227 and 7.196). This verifies that the nonparametric additive regression model has a better fitting effect. Specifically, the high–tech industry produces an inverted “U–shaped” nonlinear impact on CO{sub 2} emissions in the eastern region, but a positive “U–shaped” nonlinear effect in the central and western regions. Therefore, the nonlinear impact of the high–tech industry on CO{sub 2} emissions in the three regions should be given adequate attention in developing effective abatement policies. - Highlights: • The nonlinear effect of the high–tech industry on CO{sub 2} emissions was investigated. • The high–tech industry yields an inverted “U–shaped” effect in the eastern region. • The high–tech industry has a positive “U–shaped” nonlinear effect in other regions. • The linear impact of the high–tech industry in the eastern region is the strongest.« less
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
Sridhar, Upasana Manimegalai; Govindarajan, Anand; Rhinehart, R Russell
2016-01-01
This work reveals the applicability of a relatively new optimization technique, Leapfrogging, for both nonlinear regression modeling and a methodology for nonlinear model-predictive control. Both are relatively simple, yet effective. The application on a nonlinear, pilot-scale, shell-and-tube heat exchanger reveals practicability of the techniques. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
New methods of testing nonlinear hypothesis using iterative NLLS estimator
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.
2017-11-01
This research paper discusses the method of testing nonlinear hypothesis using iterative Nonlinear Least Squares (NLLS) estimator. Takeshi Amemiya [1] explained this method. However in the present research paper, a modified Wald test statistic due to Engle, Robert [6] is proposed to test the nonlinear hypothesis using iterative NLLS estimator. An alternative method for testing nonlinear hypothesis using iterative NLLS estimator based on nonlinear hypothesis using iterative NLLS estimator based on nonlinear studentized residuals has been proposed. In this research article an innovative method of testing nonlinear hypothesis using iterative restricted NLLS estimator is derived. Pesaran and Deaton [10] explained the methods of testing nonlinear hypothesis. This paper uses asymptotic properties of nonlinear least squares estimator proposed by Jenrich [8]. The main purpose of this paper is to provide very innovative methods of testing nonlinear hypothesis using iterative NLLS estimator, iterative NLLS estimator based on nonlinear studentized residuals and iterative restricted NLLS estimator. Eakambaram et al. [12] discussed least absolute deviation estimations versus nonlinear regression model with heteroscedastic errors and also they studied the problem of heteroscedasticity with reference to nonlinear regression models with suitable illustration. William Grene [13] examined the interaction effect in nonlinear models disused by Ai and Norton [14] and suggested ways to examine the effects that do not involve statistical testing. Peter [15] provided guidelines for identifying composite hypothesis and addressing the probability of false rejection for multiple hypotheses.
The ice age cycle and the deglaciations: an application of nonlinear regression modelling
NASA Astrophysics Data System (ADS)
Dalgleish, A. N.; Boulton, G. S.; Renshaw, E.
2000-03-01
We have applied the nonlinear regression technique known as additivity and variance stabilisation (AVAS) to time series which reflect Earth's climate over the last 600 ka. AVAS estimates a smooth, nonlinear transform for each variable, under the assumption of an additive model. The Earth's orbital parameters and insolation variations have been used as regression variables. Analysis of the contribution of each variable shows that the deglaciations are characterised by periods of increasing obliquity and perihelion approaching the vernal equinox, but not by any systematic change in eccentricity. The magnitude of insolation changes also plays no role. By approximating the transforms we can obtain a future prediction, with a glacial maximum at 60 ka AP, and a subsequent obliquity and precession forced deglaciation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less
NASA Astrophysics Data System (ADS)
Song, Lanlan
2017-04-01
Nitrous oxide is much more potent greenhouse gas than carbon dioxide. However, the estimation of N2O flux is usually clouded with uncertainty, mainly due to high spatial and temporal variations. This hampers the development of general mechanistic models for N2O emission as well, as most previously developed models were empirical or exhibited low predictability with numerous assumptions. In this study, we tested General Regression Neural Networks (GRNN) as an alternative to classic empirical models for simulating N2O emission in riparian zones of Reservoirs. GRNN and nonlinear regression (NLR) were applied to estimate the N2O flux of 1-year observations in riparian zones of Three Gorge Reservoir. NLR resulted in lower prediction power and higher residuals compared to GRNN. Although nonlinear regression model estimated similar average values of N2O, it could not capture the fluctuation patterns accurately. In contrast, GRNN model achieved a fairly high predictability, with an R2 of 0.59 for model validation, 0.77 for model calibration (training), and a low root mean square error (RMSE), indicating a high capacity to simulate the dynamics of N2O flux. According to a sensitivity analysis of the GRNN, nonlinear relationships between input variables and N2O flux were well explained. Our results suggest that the GRNN developed in this study has a greater performance in simulating variations in N2O flux than nonlinear regressions.
Igne, Benoît; Drennen, James K; Anderson, Carl A
2014-01-01
Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
Cooley, Richard L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
Inverse models: A necessary next step in ground-water modeling
Poeter, E.P.; Hill, M.C.
1997-01-01
Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.
Shah, A A; Xing, W W; Triantafyllidis, V
2017-04-01
In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach.
Xing, W. W.; Triantafyllidis, V.
2017-01-01
In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach. PMID:28484327
Bayesian Analysis of Nonlinear Structural Equation Models with Nonignorable Missing Data
ERIC Educational Resources Information Center
Lee, Sik-Yum
2006-01-01
A Bayesian approach is developed for analyzing nonlinear structural equation models with nonignorable missing data. The nonignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of…
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
Terza, Joseph V; Bradford, W David; Dismuke, Clara E
2008-01-01
Objective To investigate potential bias in the use of the conventional linear instrumental variables (IV) method for the estimation of causal effects in inherently nonlinear regression settings. Data Sources Smoking Supplement to the 1979 National Health Interview Survey, National Longitudinal Alcohol Epidemiologic Survey, and simulated data. Study Design Potential bias from the use of the linear IV method in nonlinear models is assessed via simulation studies and real world data analyses in two commonly encountered regression setting: (1) models with a nonnegative outcome (e.g., a count) and a continuous endogenous regressor; and (2) models with a binary outcome and a binary endogenous regressor. Principle Findings The simulation analyses show that substantial bias in the estimation of causal effects can result from applying the conventional IV method in inherently nonlinear regression settings. Moreover, the bias is not attenuated as the sample size increases. This point is further illustrated in the survey data analyses in which IV-based estimates of the relevant causal effects diverge substantially from those obtained with appropriate nonlinear estimation methods. Conclusions We offer this research as a cautionary note to those who would opt for the use of linear specifications in inherently nonlinear settings involving endogeneity. PMID:18546544
Nonlinear Constitutive Modeling of Piezoelectric Ceramics
NASA Astrophysics Data System (ADS)
Xu, Jia; Li, Chao; Wang, Haibo; Zhu, Zhiwen
2017-12-01
Nonlinear constitutive modeling of piezoelectric ceramics is discussed in this paper. Van der Pol item is introduced to explain the simple hysteretic curve. Improved nonlinear difference items are used to interpret the hysteresis phenomena of piezoelectric ceramics. The fitting effect of the model on experimental data is proved by the partial least-square regression method. The results show that this method can describe the real curve well. The results of this paper are helpful to piezoelectric ceramics constitutive modeling.
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-01-01
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO2 emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO2 emissions is significantly higher than those of GDPpc and Es on per capita CO2 emissions. PMID:29236083
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-12-13
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO₂ emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO₂ emissions is significantly higher than those of GDPpc and Es on per capita CO₂ emissions.
NASA Astrophysics Data System (ADS)
Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou
2013-05-01
A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Mooijaart, Ab; Satorra, Albert
2009-01-01
In this paper, we show that for some structural equation models (SEM), the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model. As an example, we consider a regression model with latent variables and interactions terms. Not only the model test has zero power against that type of…
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Xu, Wenjun; Chen, Jie; Lau, Henry Y K; Ren, Hongliang
2017-09-01
Accurate motion control of flexible surgical manipulators is crucial in tissue manipulation tasks. The tendon-driven serpentine manipulator (TSM) is one of the most widely adopted flexible mechanisms in minimally invasive surgery because of its enhanced maneuverability in torturous environments. TSM, however, exhibits high nonlinearities and conventional analytical kinematics model is insufficient to achieve high accuracy. To account for the system nonlinearities, we applied a data driven approach to encode the system inverse kinematics. Three regression methods: extreme learning machine (ELM), Gaussian mixture regression (GMR) and K-nearest neighbors regression (KNNR) were implemented to learn a nonlinear mapping from the robot 3D position states to the control inputs. The performance of the three algorithms was evaluated both in simulation and physical trajectory tracking experiments. KNNR performed the best in the tracking experiments, with the lowest RMSE of 2.1275 mm. The proposed inverse kinematics learning methods provide an alternative and efficient way to accurately model the tendon driven flexible manipulator. Copyright © 2016 John Wiley & Sons, Ltd.
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
NASA Astrophysics Data System (ADS)
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
ERIC Educational Resources Information Center
Pek, Jolynn; Chalmers, R. Philip; Kok, Bethany E.; Losardo, Diane
2015-01-01
Structural equation mixture models (SEMMs), when applied as a semiparametric model (SPM), can adequately recover potentially nonlinear latent relationships without their specification. This SPM is useful for exploratory analysis when the form of the latent regression is unknown. The purpose of this article is to help users familiar with structural…
Shuguang Liua; Pamela Anderson; Guoyi Zhoud; Boone Kauffman; Flint Hughes; David Schimel; Vicente Watson; Joseph Tosi
2008-01-01
Objectively assessing the performance of a model and deriving model parameter values from observations are critical and challenging in landscape to regional modeling. In this paper, we applied a nonlinear inversion technique to calibrate the ecosystem model CENTURY against carbon (C) and nitrogen (N) stock measurements collected from 39 mature tropical forest sites in...
Evaluation of confidence intervals for a steady-state leaky aquifer model
Christensen, S.; Cooley, R.L.
1999-01-01
The fact that dependent variables of groundwater models are generally nonlinear functions of model parameters is shown to be a potentially significant factor in calculating accurate confidence intervals for both model parameters and functions of the parameters, such as the values of dependent variables calculated by the model. The Lagrangian method of Vecchia and Cooley [Vecchia, A.V. and Cooley, R.L., Water Resources Research, 1987, 23(7), 1237-1250] was used to calculate nonlinear Scheffe-type confidence intervals for the parameters and the simulated heads of a steady-state groundwater flow model covering 450 km2 of a leaky aquifer. The nonlinear confidence intervals are compared to corresponding linear intervals. As suggested by the significant nonlinearity of the regression model, linear confidence intervals are often not accurate. The commonly made assumption that widths of linear confidence intervals always underestimate the actual (nonlinear) widths was not correct. Results show that nonlinear effects can cause the nonlinear intervals to be asymmetric and either larger or smaller than the linear approximations. Prior information on transmissivities helps reduce the size of the confidence intervals, with the most notable effects occurring for the parameters on which there is prior information and for head values in parameter zones for which there is prior information on the parameters.The fact that dependent variables of groundwater models are generally nonlinear functions of model parameters is shown to be a potentially significant factor in calculating accurate confidence intervals for both model parameters and functions of the parameters, such as the values of dependent variables calculated by the model. The Lagrangian method of Vecchia and Cooley was used to calculate nonlinear Scheffe-type confidence intervals for the parameters and the simulated heads of a steady-state groundwater flow model covering 450 km2 of a leaky aquifer. The nonlinear confidence intervals are compared to corresponding linear intervals. As suggested by the significant nonlinearity of the regression model, linear confidence intervals are often not accurate. The commonly made assumption that widths of linear confidence intervals always underestimate the actual (nonlinear) widths was not correct. Results show that nonlinear effects can cause the nonlinear intervals to be asymmetric and either larger or smaller than the linear approximations. Prior information on transmissivities helps reduce the size of the confidence intervals, with the most notable effects occurring for the parameters on which there is prior information and for head values in parameter zones for which there is prior information on the parameters.
NASA Astrophysics Data System (ADS)
Gürcan, Eser Kemal
2017-04-01
The most commonly used methods for analyzing time-dependent data are multivariate analysis of variance (MANOVA) and nonlinear regression models. The aim of this study was to compare some MANOVA techniques and nonlinear mixed modeling approach for investigation of growth differentiation in female and male Japanese quail. Weekly individual body weight data of 352 male and 335 female quail from hatch to 8 weeks of age were used to perform analyses. It is possible to say that when all the analyses are evaluated, the nonlinear mixed modeling is superior to the other techniques because it also reveals the individual variation. In addition, the profile analysis also provides important information.
NASA Astrophysics Data System (ADS)
Cai, Jun; Wang, Kuaishe; Shi, Jiamin; Wang, Wen; Liu, Yingying
2018-01-01
Constitutive analysis for hot working of BFe10-1-2 alloy was carried out by using experimental stress-strain data from isothermal hot compression tests, in a wide range of temperature of 1,023 1,273 K, and strain rate range of 0.001 10 s-1. A constitutive equation based on modified double multiple nonlinear regression was proposed considering the independent effects of strain, strain rate, temperature and their interrelation. The predicted flow stress data calculated from the developed equation was compared with the experimental data. Correlation coefficient (R), average absolute relative error (AARE) and relative errors were introduced to verify the validity of the developed constitutive equation. Subsequently, a comparative study was made on the capability of strain-compensated Arrhenius-type constitutive model. The results showed that the developed constitutive equation based on modified double multiple nonlinear regression could predict flow stress of BFe10-1-2 alloy with good correlation and generalization.
A different approach to estimate nonlinear regression model using numerical methods
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.
2017-11-01
This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].
Modelling daily water temperature from air temperature for the Missouri River.
Zhu, Senlin; Nyarko, Emmanuel Karlo; Hadzima-Nyarko, Marijana
2018-01-01
The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air-water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.
Genomic prediction based on data from three layer lines using non-linear regression models.
Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L
2014-11-06
Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.
Multi-Target Regression via Robust Low-Rank Learning.
Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo
2018-02-01
Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Savary, Serge; Delbac, Lionel; Rochas, Amélie; Taisant, Guillaume; Willocquet, Laetitia
2009-08-01
Dual epidemics are defined as epidemics developing on two or several plant organs in the course of a cropping season. Agricultural pathosystems where such epidemics develop are often very important, because the harvestable part is one of the organs affected. These epidemics also are often difficult to manage, because the linkage between epidemiological components occurring on different organs is poorly understood, and because prediction of the risk toward the harvestable organs is difficult. In the case of downy mildew (DM) and powdery mildew (PM) of grapevine, nonlinear modeling and logistic regression indicated nonlinearity in the foliage-cluster relationships. Nonlinear modeling enabled the parameterization of a transmission coefficient that numerically links the two components, leaves and clusters, in DM and PM epidemics. Logistic regression analysis yielded a series of probabilistic models that enabled predicting preset levels of cluster infection risks based on DM and PM severities on the foliage at successive crop stages. The usefulness of this framework for tactical decision-making for disease control is discussed.
Pointwise influence matrices for functional-response regression.
Reiss, Philip T; Huang, Lei; Wu, Pei-Shien; Chen, Huaihou; Colcombe, Stan
2017-12-01
We extend the notion of an influence or hat matrix to regression with functional responses and scalar predictors. For responses depending linearly on a set of predictors, our definition is shown to reduce to the conventional influence matrix for linear models. The pointwise degrees of freedom, the trace of the pointwise influence matrix, are shown to have an adaptivity property that motivates a two-step bivariate smoother for modeling nonlinear dependence on a single predictor. This procedure adapts to varying complexity of the nonlinear model at different locations along the function, and thereby achieves better performance than competing tensor product smoothers in an analysis of the development of white matter microstructure in the brain. © 2017, The International Biometric Society.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
NASA Astrophysics Data System (ADS)
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-03-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-01-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
Cooley, Richard L.
1983-01-01
This paper investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. First, if the parameters are properly scaled, linearized expressions for the mean square error (MSE) in parameter estimates of a nonlinear model will often behave very nearly as if the model were linear. Second, by using prior information, the MSE in properly scaled parameters can be reduced greatly over the MSE of ordinary least squares estimates of parameters. Third, plots of estimated MSE and the estimated standard deviation of MSE versus an auxiliary parameter (the ridge parameter) specifying the degree of influence of the prior information on regression results can help determine the potential for improvement of parameter estimates. Fourth, proposed criteria can be used to make appropriate choices for the ridge parameter and another parameter expressing degree of overall bias in the prior information. Results of a case study of Truckee Meadows, Reno-Sparks area, Washoe County, Nevada, conform closely to the results of the hypothetical problem. In the Truckee Meadows case, incorporation of prior information did not greatly change the parameter estimates from those obtained by ordinary least squares. However, the analysis showed that both sets of estimates are more reliable than suggested by the standard errors from ordinary least squares.
Boosting structured additive quantile regression for longitudinal childhood obesity data.
Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael
2013-07-25
Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.
Nonlinear System Identification for Aeroelastic Systems with Application to Experimental Data
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.
2008-01-01
Representation and identification of a nonlinear aeroelastic pitch-plunge system as a model of the Nonlinear AutoRegressive, Moving Average eXogenous (NARMAX) class is considered. A nonlinear difference equation describing this aircraft model is derived theoretically and shown to be of the NARMAX form. Identification methods for NARMAX models are applied to aeroelastic dynamics and its properties demonstrated via continuous-time simulations of experimental conditions. Simulation results show that (1) the outputs of the NARMAX model closely match those generated using continuous-time methods, and (2) NARMAX identification methods applied to aeroelastic dynamics provide accurate discrete-time parameter estimates. Application of NARMAX identification to experimental pitch-plunge dynamics data gives a high percent fit for cross-validated data.
Minimizing bias in biomass allometry: Model selection and log transformation of data
Joseph Mascaro; undefined undefined; Flint Hughes; Amanda Uowolo; Stefan A. Schnitzer
2011-01-01
Nonlinear regression is increasingly used to develop allometric equations for forest biomass estimation (i.e., as opposed to the raditional approach of log-transformation followed by linear regression). Most statistical software packages, however, assume additive errors by default, violating a key assumption of allometric theory and possibly producing spurious models....
A Comparison of Methods for Estimating Quadratic Effects in Nonlinear Structural Equation Models
ERIC Educational Resources Information Center
Harring, Jeffrey R.; Weiss, Brandi A.; Hsu, Jui-Chen
2012-01-01
Two Monte Carlo simulations were performed to compare methods for estimating and testing hypotheses of quadratic effects in latent variable regression models. The methods considered in the current study were (a) a 2-stage moderated regression approach using latent variable scores, (b) an unconstrained product indicator approach, (c) a latent…
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Model-free inference of direct network interactions from nonlinear collective dynamics.
Casadiego, Jose; Nitzan, Mor; Hallerberg, Sarah; Timme, Marc
2017-12-19
The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.
ERIC Educational Resources Information Center
Bloom, Allan M.; And Others
In response to the increasing importance of student performance in required classes, research was conducted to compare two prediction procedures, linear modeling using multiple regression and nonlinear modeling using AID3. Performance in the first college math course (College Mathematics, Calculus, or Business Calculus Matrices) was the dependent…
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis. ?? 2005 Elsevier Ltd. All rights reserved.
PharmML in Action: an Interoperable Language for Modeling and Simulation
Bizzotto, R; Smith, G; Yvon, F; Kristensen, NR; Swat, MJ
2017-01-01
PharmML1 is an XML‐based exchange format2, 3, 4 created with a focus on nonlinear mixed‐effect (NLME) models used in pharmacometrics,5, 6 but providing a very general framework that also allows describing mathematical and statistical models such as single‐subject or nonlinear and multivariate regression models. This tutorial provides an overview of the structure of this language, brief suggestions on how to work with it, and use cases demonstrating its power and flexibility. PMID:28575551
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping
This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
Motulsky, Harvey J; Brown, Ronald E
2006-01-01
Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface flow from adjacent regions; irrigation and septic field seepage; and leakage through the Rio Grande, canal, and Cochiti Reservoir beds. Ground water is discharged from the basin by withdrawal; evapotranspiration; subsurface flow; and flow to the Rio Grande, canals, and drains. The transient, three-dimensional numerical model of ground-water flow to which nonlinear-regression methods were applied simulates flow in the Albuquerque Basin from 1900 to March 1995. Six different basin subsurface configurations are considered in the model. These configurations are designed to test the effects of (1) varying the simulated basin thickness, (2) including a hypothesized hydrogeologic unit with large hydraulic conductivity in the western part of the basin (the west basin high-K zone), and (3) substantially lowering the simulated hydraulic conductivity of a fault in the western part of the basin (the low-K fault zone). The model with each of the subsurface configurations was calibrated using a nonlinear least- squares regression technique. The calibration data set includes 802 hydraulic-head measurements that provide broad spatial and temporal coverage of basin conditions, and one measurement of net flow from the Rio Grande and drains to the ground-water system in the Albuquerque area. Data are weighted on the basis of estimates of the standard deviations of measurement errors. The 10 to 12 parameters to which the calibration data as a whole are generally most sensitive were estimated by nonlinear regression, whereas the remaining model parameter values were specified. Results of model calibration indicate that the optimal parameter estimates as a whole are most reasonable in calibrations of the model with with configurations 3 (which contains 1,600-ft-thick basin deposits and the west basin high-K zone), 4 (which contains 5,000-ft-thick basin de
Development of Ensemble Model Based Water Demand Forecasting Model
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop
2014-05-01
In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)
Geographical variation of cerebrovascular disease in New York State: the correlation with income
Han, Daikwon; Carrow, Shannon S; Rogerson, Peter A; Munschauer, Frederick E
2005-01-01
Background Income is known to be associated with cerebrovascular disease; however, little is known about the more detailed relationship between cerebrovascular disease and income. We examined the hypothesis that the geographical distribution of cerebrovascular disease in New York State may be predicted by a nonlinear model using income as a surrogate socioeconomic risk factor. Results We used spatial clustering methods to identify areas with high and low prevalence of cerebrovascular disease at the ZIP code level after smoothing rates and correcting for edge effects; geographic locations of high and low clusters of cerebrovascular disease in New York State were identified with and without income adjustment. To examine effects of income, we calculated the excess number of cases using a non-linear regression with cerebrovascular disease rates taken as the dependent variable and income and income squared taken as independent variables. The resulting regression equation was: excess rate = 32.075 - 1.22*10-4(income) + 8.068*10-10(income2), and both income and income squared variables were significant at the 0.01 level. When income was included as a covariate in the non-linear regression, the number and size of clusters of high cerebrovascular disease prevalence decreased. Some 87 ZIP codes exceeded the critical value of the local statistic yielding a relative risk of 1.2. The majority of low cerebrovascular disease prevalence geographic clusters disappeared when the non-linear income effect was included. For linear regression, the excess rate of cerebrovascular disease falls with income; each $10,000 increase in median income of each ZIP code resulted in an average reduction of 3.83 observed cases. The significant nonlinear effect indicates a lessening of this income effect with increasing income. Conclusion Income is a non-linear predictor of excess cerebrovascular disease rates, with both low and high observed cerebrovascular disease rate areas associated with higher income. Income alone explains a significant amount of the geographical variance in cerebrovascular disease across New York State since both high and low clusters of cerebrovascular disease dissipate or disappear with income adjustment. Geographical modeling, including non-linear effects of income, may allow for better identification of other non-traditional risk factors. PMID:16242043
Fisz, Jacek J
2006-12-07
The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear combinations of nonlinear functions, is indicated. The VP algorithm does not distinguish the weakly nonlinear parameters from the nonlinear ones and it does not apply to the model functions which are multi-linear combinations of nonlinear functions.
Creep analysis of silicone for podiatry applications.
Janeiro-Arocas, Julia; Tarrío-Saavedra, Javier; López-Beceiro, Jorge; Naya, Salvador; López-Canosa, Adrián; Heredia-García, Nicolás; Artiaga, Ramón
2016-10-01
This work shows an effective methodology to characterize the creep-recovery behavior of silicones before their application in podiatry. The aim is to characterize, model and compare the creep-recovery properties of different types of silicone used in podiatry orthotics. Creep-recovery phenomena of silicones used in podiatry orthotics is characterized by dynamic mechanical analysis (DMA). Silicones provided by Herbitas are compared by observing their viscoelastic properties by Functional Data Analysis (FDA) and nonlinear regression. The relationship between strain and time is modeled by fixed and mixed effects nonlinear regression to compare easily and intuitively podiatry silicones. Functional ANOVA and Kohlrausch-Willians-Watts (KWW) model with fixed and mixed effects allows us to compare different silicones observing the values of fitting parameters and their physical meaning. The differences between silicones are related to the variations of breadth of creep-recovery time distribution and instantaneous deformation-permanent strain. Nevertheless, the mean creep-relaxation time is the same for all the studied silicones. Silicones used in palliative orthoses have higher instantaneous deformation-permanent strain and narrower creep-recovery distribution. The proposed methodology based on DMA, FDA and nonlinear regression is an useful tool to characterize and choose the proper silicone for each podiatry application according to their viscoelastic properties. Copyright © 2016 Elsevier Ltd. All rights reserved.
Flexible link functions in nonparametric binary regression with Gaussian process priors.
Li, Dan; Wang, Xia; Lin, Lizhen; Dey, Dipak K
2016-09-01
In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. © 2015, The International Biometric Society.
Flexible Link Functions in Nonparametric Binary Regression with Gaussian Process Priors
Li, Dan; Lin, Lizhen; Dey, Dipak K.
2015-01-01
Summary In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. PMID:26686333
Developing a predictive tropospheric ozone model for Tabriz
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi
2013-04-01
Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
PRESS-based EFOR algorithm for the dynamic parametrical modeling of nonlinear MDOF systems
NASA Astrophysics Data System (ADS)
Liu, Haopeng; Zhu, Yunpeng; Luo, Zhong; Han, Qingkai
2017-09-01
In response to the identification problem concerning multi-degree of freedom (MDOF) nonlinear systems, this study presents the extended forward orthogonal regression (EFOR) based on predicted residual sums of squares (PRESS) to construct a nonlinear dynamic parametrical model. The proposed parametrical model is based on the non-linear autoregressive with exogenous inputs (NARX) model and aims to explicitly reveal the physical design parameters of the system. The PRESS-based EFOR algorithm is proposed to identify such a model for MDOF systems. By using the algorithm, we built a common-structured model based on the fundamental concept of evaluating its generalization capability through cross-validation. The resulting model aims to prevent over-fitting with poor generalization performance caused by the average error reduction ratio (AERR)-based EFOR algorithm. Then, a functional relationship is established between the coefficients of the terms and the design parameters of the unified model. Moreover, a 5-DOF nonlinear system is taken as a case to illustrate the modeling of the proposed algorithm. Finally, a dynamic parametrical model of a cantilever beam is constructed from experimental data. Results indicate that the dynamic parametrical model of nonlinear systems, which depends on the PRESS-based EFOR, can accurately predict the output response, thus providing a theoretical basis for the optimal design of modeling methods for MDOF nonlinear systems.
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.
Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-08-01
This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seong W. Lee
During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.
Advanced Statistical Analyses to Reduce Inconsistency of Bond Strength Data.
Minamino, T; Mine, A; Shintani, A; Higashi, M; Kawaguchi-Uemura, A; Kabetani, T; Hagino, R; Imai, D; Tajiri, Y; Matsumoto, M; Yatani, H
2017-11-01
This study was designed to clarify the interrelationship of factors that affect the value of microtensile bond strength (µTBS), focusing on nondestructive testing by which information of the specimens can be stored and quantified. µTBS test specimens were prepared from 10 noncarious human molars. Six factors of µTBS test specimens were evaluated: presence of voids at the interface, X-ray absorption coefficient of resin, X-ray absorption coefficient of dentin, length of dentin part, size of adhesion area, and individual differences of teeth. All specimens were observed nondestructively by optical coherence tomography and micro-computed tomography before µTBS testing. After µTBS testing, the effect of these factors on µTBS data was analyzed by the general linear model, linear mixed effects regression model, and nonlinear regression model with 95% confidence intervals. By the general linear model, a significant difference in individual differences of teeth was observed ( P < 0.001). A significantly positive correlation was shown between µTBS and length of dentin part ( P < 0.001); however, there was no significant nonlinearity ( P = 0.157). Moreover, a significantly negative correlation was observed between µTBS and size of adhesion area ( P = 0.001), with significant nonlinearity ( P = 0.014). No correlation was observed between µTBS and X-ray absorption coefficient of resin ( P = 0.147), and there was no significant nonlinearity ( P = 0.089). Additionally, a significantly positive correlation was observed between µTBS and X-ray absorption coefficient of dentin ( P = 0.022), with significant nonlinearity ( P = 0.036). A significant difference was also observed between the presence and absence of voids by linear mixed effects regression analysis. Our results showed correlations between various parameters of tooth specimens and µTBS data. To evaluate the performance of the adhesive more precisely, the effect of tooth variability and a method to reduce variation in bond strength values should also be considered.
Sumner, Anne E; Luercio, Marcella F; Frempong, Barbara A; Ricks, Madia; Sen, Sabyasachi; Kushner, Harvey; Tulloch-Reid, Marshall K
2009-02-01
The disposition index, the product of the insulin sensitivity index (S(I)) and the acute insulin response to glucose, is linked in African Americans to chromosome 11q. This link was determined with S(I) calculated with the nonlinear regression approach to the minimal model and data from the reduced-sample insulin-modified frequently-sampled intravenous glucose tolerance test (Reduced-Sample-IM-FSIGT). However, the application of the nonlinear regression approach to calculate S(I) using data from the Reduced-Sample-IM-FSIGT has been challenged as being not only inaccurate but also having a high failure rate in insulin-resistant subjects. Our goal was to determine the accuracy and failure rate of the Reduced-Sample-IM-FSIGT using the nonlinear regression approach to the minimal model. With S(I) from the Full-Sample-IM-FSIGT considered the standard and using the nonlinear regression approach to the minimal model, we compared the agreement between S(I) from the Full- and Reduced-Sample-IM-FSIGT protocols. One hundred African Americans (body mass index, 31.3 +/- 7.6 kg/m(2) [mean +/- SD]; range, 19.0-56.9 kg/m(2)) had FSIGTs. Glucose (0.3 g/kg) was given at baseline. Insulin was infused from 20 to 25 minutes (total insulin dose, 0.02 U/kg). For the Full-Sample-IM-FSIGT, S(I) was calculated based on the glucose and insulin samples taken at -1, 1, 2, 3, 4, 5, 6, 7, 8,10, 12, 14, 16, 19, 22, 23, 24, 25, 27, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, and 180 minutes. For the Reduced-Sample-FSIGT, S(I) was calculated based on the time points that appear in bold. Agreement was determined by Spearman correlation, concordance, and the Bland-Altman method. In addition, for both protocols, the population was divided into tertiles of S(I). Insulin resistance was defined by the lowest tertile of S(I) from the Full-Sample-IM-FSIGT. The distribution of subjects across tertiles was compared by rank order and kappa statistic. We found that the rate of failure of resolution of S(I) by the Reduced-Sample-IM-FSIGT was 3% (3/100). For the remaining 97 subjects, S(I) for the Full- and Reduced-Sample-IM-FSIGTs were as follows: 3.76 +/- 2.41 L mU(-1) min(-1) (range, 0.58-14.50) and 4.29 +/- 2.89 L mU(-1) min(-1) (range, 0.52-14.42); relative error, 21% +/- 18%; Spearman r = 0.97; and concordance, 0.94 (both P < .001). After log transformation, the Bland-Altman limits of agreement were -0.29 and 0.53. The exact agreement for distribution of the population in the insulin-resistant tertile vs the insulin-sensitive tertiles was 92%, kappa of 0.82 +/- 0.06. Using the nonlinear regression approach and data from the Reduced-Sample-IM-FSIGT in subjects with a wide range of insulin sensitivity, failure to resolve S(I) occurred in only 3% of subjects. The agreement and maintenance of rank order of S(I) between protocols support the use of the nonlinear regression approach to the minimal model and the Reduced-Sample-IM-FSIGT in clinical studies.
Optimizing separate phase light hydrocarbon recovery from contaminated unconfined aquifers
NASA Astrophysics Data System (ADS)
Cooper, Grant S.; Peralta, Richard C.; Kaluarachchi, Jagath J.
A modeling approach is presented that optimizes separate phase recovery of light non-aqueous phase liquids (LNAPL) for a single dual-extraction well in a homogeneous, isotropic unconfined aquifer. A simulation/regression/optimization (S/R/O) model is developed to predict, analyze, and optimize the oil recovery process. The approach combines detailed simulation, nonlinear regression, and optimization. The S/R/O model utilizes nonlinear regression equations describing system response to time-varying water pumping and oil skimming. Regression equations are developed for residual oil volume and free oil volume. The S/R/O model determines optimized time-varying (stepwise) pumping rates which minimize residual oil volume and maximize free oil recovery while causing free oil volume to decrease a specified amount. This S/R/O modeling approach implicitly immobilizes the free product plume by reversing the water table gradient while achieving containment. Application to a simple representative problem illustrates the S/R/O model utility for problem analysis and remediation design. When compared with the best steady pumping strategies, the optimal stepwise pumping strategy improves free oil recovery by 11.5% and reduces the amount of residual oil left in the system due to pumping by 15%. The S/R/O model approach offers promise for enhancing the design of free phase LNAPL recovery systems and to help in making cost-effective operation and management decisions for hydrogeologists, engineers, and regulators.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
A deep belief network with PLSR for nonlinear system modeling.
Qiao, Junfei; Wang, Gongming; Li, Wenjing; Li, Xiaoli
2018-08-01
Nonlinear system modeling plays an important role in practical engineering, and deep learning-based deep belief network (DBN) is now popular in nonlinear system modeling and identification because of the strong learning ability. However, the existing weights optimization for DBN is based on gradient, which always leads to a local optimum and a poor training result. In this paper, a DBN with partial least square regression (PLSR-DBN) is proposed for nonlinear system modeling, which focuses on the problem of weights optimization for DBN using PLSR. Firstly, unsupervised contrastive divergence (CD) algorithm is used in weights initialization. Secondly, initial weights derived from CD algorithm are optimized through layer-by-layer PLSR modeling from top layer to bottom layer. Instead of gradient method, PLSR-DBN can determine the optimal weights using several PLSR models, so that a better performance of PLSR-DBN is achieved. Then, the analysis of convergence is theoretically given to guarantee the effectiveness of the proposed PLSR-DBN model. Finally, the proposed PLSR-DBN is tested on two benchmark nonlinear systems and an actual wastewater treatment system as well as a handwritten digit recognition (nonlinear mapping and modeling) with high-dimension input data. The experiment results show that the proposed PLSR-DBN has better performances of time and accuracy on nonlinear system modeling than that of other methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
On the interannual oscillations in the northern temperate total ozone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krzyscin, J.W.
1994-07-01
The interannual variations in total ozone are studied using revised Dobson total ozone records (1961-1990) from 17 stations located within the latitude band 30 deg N - 60 deg N. To obtain the quasi-biennial oscillation (QBO), El Nino-Southern Oscillation (ENSO), and 11-year solar cycle manifestation in the `northern temperate` total ozone data, various multiple regression models are constructed by the least squares fitting to the observed ozone. The statistical relationships between the selected indices of the atmospheric variabilities and total ozone are described in the linear and nonlinear regression models. Nonlinear relationships to the predictor variables are found. That is,more » the total ozone variations are statistically modeled by nonlinear terms accounting for the coupling between QBO and ENSO, QBO and solar activity, and ENSO and solar activity. It is suggested that large reduction of total ozone values over the `northern temperate` region occurs in cold season when a strong ENSO warm event meets the west phase of the QBO during the period of high solar activity.« less
NASA Astrophysics Data System (ADS)
See, J. J.; Jamaian, S. S.; Salleh, R. M.; Nor, M. E.; Aman, F.
2018-04-01
This research aims to estimate the parameters of Monod model of microalgae Botryococcus Braunii sp growth by the Least-Squares method. Monod equation is a non-linear equation which can be transformed into a linear equation form and it is solved by implementing the Least-Squares linear regression method. Meanwhile, Gauss-Newton method is an alternative method to solve the non-linear Least-Squares problem with the aim to obtain the parameters value of Monod model by minimizing the sum of square error ( SSE). As the result, the parameters of the Monod model for microalgae Botryococcus Braunii sp can be estimated by the Least-Squares method. However, the estimated parameters value obtained by the non-linear Least-Squares method are more accurate compared to the linear Least-Squares method since the SSE of the non-linear Least-Squares method is less than the linear Least-Squares method.
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Additivity of nonlinear biomass equations
Bernard R. Parresol
2001-01-01
Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
Regression-based model of skin diffuse reflectance for skin color analysis
NASA Astrophysics Data System (ADS)
Tsumura, Norimichi; Kawazoe, Daisuke; Nakaguchi, Toshiya; Ojima, Nobutoshi; Miyake, Yoichi
2008-11-01
A simple regression-based model of skin diffuse reflectance is developed based on reflectance samples calculated by Monte Carlo simulation of light transport in a two-layered skin model. This reflectance model includes the values of spectral reflectance in the visible spectra for Japanese women. The modified Lambert Beer law holds in the proposed model with a modified mean free path length in non-linear density space. The averaged RMS and maximum errors of the proposed model were 1.1 and 3.1%, respectively, in the above range.
Development and Application of Nonlinear Land-Use Regression Models
NASA Astrophysics Data System (ADS)
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted the testing set. Missing data have been completed using multiple linear regression and annual average values of pollutant concentrations were computed. All sensors are dispersed homogeneously over the central urban area of Geneva. The main result of the study is that the nonlinear LUR models developed have demonstrated their efficiency in modelling complex phrenomena of air pollution in urban zones and significantly reduced the testing error in comparison with linear models. Further research deals with the development and application of other non-linear data-driven models (Kanevski et al. 2009). References Kanevski M., Pozdnoukhov A. and Timonin V. (2009). Machine Learning for Spatial Environmental Data. Theory, Applications and Software. EPLF Press, Lausanne.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biyanto, Totok R.
Fouling in a heat exchanger in Crude Preheat Train (CPT) refinery is an unsolved problem that reduces the plant efficiency, increases fuel consumption and CO{sub 2} emission. The fouling resistance behavior is very complex. It is difficult to develop a model using first principle equation to predict the fouling resistance due to different operating conditions and different crude blends. In this paper, Artificial Neural Networks (ANN) MultiLayer Perceptron (MLP) with input structure using Nonlinear Auto-Regressive with eXogenous (NARX) is utilized to build the fouling resistance model in shell and tube heat exchanger (STHX). The input data of the model aremore » flow rates and temperatures of the streams of the heat exchanger, physical properties of product and crude blend data. This model serves as a predicting tool to optimize operating conditions and preventive maintenance of STHX. The results show that the model can capture the complexity of fouling characteristics in heat exchanger due to thermodynamic conditions and variations in crude oil properties (blends). It was found that the Root Mean Square Error (RMSE) are suitable to capture the nonlinearity and complexity of the STHX fouling resistance during phases of training and validation.« less
Two-dimensional advective transport in ground-water flow parameter estimation
Anderman, E.R.; Hill, M.C.; Poeter, E.P.
1996-01-01
Nonlinear regression is useful in ground-water flow parameter estimation, but problems of parameter insensitivity and correlation often exist given commonly available hydraulic-head and head-dependent flow (for example, stream and lake gain or loss) observations. To address this problem, advective-transport observations are added to the ground-water flow, parameter-estimation model MODFLOWP using particle-tracking methods. The resulting model is used to investigate the importance of advective-transport observations relative to head-dependent flow observations when either or both are used in conjunction with hydraulic-head observations in a simulation of the sewage-discharge plume at Otis Air Force Base, Cape Cod, Massachusetts, USA. The analysis procedure for evaluating the probable effect of new observations on the regression results consists of two steps: (1) parameter sensitivities and correlations calculated at initial parameter values are used to assess the model parameterization and expected relative contributions of different types of observations to the regression; and (2) optimal parameter values are estimated by nonlinear regression and evaluated. In the Cape Cod parameter-estimation model, advective-transport observations did not significantly increase the overall parameter sensitivity; however: (1) inclusion of advective-transport observations decreased parameter correlation enough for more unique parameter values to be estimated by the regression; (2) realistic uncertainties in advective-transport observations had a small effect on parameter estimates relative to the precision with which the parameters were estimated; and (3) the regression results and sensitivity analysis provided insight into the dynamics of the ground-water flow system, especially the importance of accurate boundary conditions. In this work, advective-transport observations improved the calibration of the model and the estimation of ground-water flow parameters, and use of regression and related techniques produced significant insight into the physical system.
[Relationship between shift work and overweight/obesity in male steel workers].
Xiao, M Y; Wang, Z Y; Fan, H M; Che, C L; Lu, Y; Cong, L X; Gao, X J; Liu, Y J; Yuan, J X; Li, X M; Hu, B; Chen, Y P
2016-11-10
Objective: To investigate the relationship between shift work and overweight/obesity in male steel workers. Methods: A questionnaire survey was conducted among the male steel workers selected during health examination in Tangshan Steel Company from March 2015 to March 2016. The relationship between shift work and overweight/obesity in the male steel workers were analyzed by using logistic regression model and restricted cubic splinemodel. Results: A total of 7 262 male steel workers were surveyed, the overall prevalence of overweight/obesitywas 64.5% (4 686/7 262), the overweight rate was 34.3% and the obesity rate was 30.2%, respectively. After adjusting for age, educational level and average family income level per month by multivariable logistic regression analysis, shift work was associated with overweight/obesity and obesity in the male steel workers. The OR was 1.19(95% CI : 1.05-1.35) and 1.15(95% CI : 1.00-1.32). Restricted cubic spline model analysis showed that the relationship between shift work years and overweight/obesity in the male steel workers was a nonlinear dose response one (nonlinear test χ 2 =7.43, P <0.05). Restricted cubic spline model analysis showed that the relationship between shift work years and obesity in the male steel workers was a nonlinear dose response one (nonlinear test χ 2 =10.48, P <0.05). Conclusion: Shift work was associated with overweight and obesity in the male steel workers, and shift work years and overweight/obesity had a nonlinear relationship.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
PharmML in Action: an Interoperable Language for Modeling and Simulation.
Bizzotto, R; Comets, E; Smith, G; Yvon, F; Kristensen, N R; Swat, M J
2017-10-01
PharmML is an XML-based exchange format created with a focus on nonlinear mixed-effect (NLME) models used in pharmacometrics, but providing a very general framework that also allows describing mathematical and statistical models such as single-subject or nonlinear and multivariate regression models. This tutorial provides an overview of the structure of this language, brief suggestions on how to work with it, and use cases demonstrating its power and flexibility. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
On neural networks in identification and control of dynamic systems
NASA Technical Reports Server (NTRS)
Phan, Minh; Juang, Jer-Nan; Hyland, David C.
1993-01-01
This paper presents a discussion of the applicability of neural networks in the identification and control of dynamic systems. Emphasis is placed on the understanding of how the neural networks handle linear systems and how the new approach is related to conventional system identification and control methods. Extensions of the approach to nonlinear systems are then made. The paper explains the fundamental concepts of neural networks in their simplest terms. Among the topics discussed are feed forward and recurrent networks in relation to the standard state-space and observer models, linear and nonlinear auto-regressive models, linear, predictors, one-step ahead control, and model reference adaptive control for linear and nonlinear systems. Numerical examples are presented to illustrate the application of these important concepts.
The NLS-Based Nonlinear Grey Multivariate Model for Forecasting Pollutant Emissions in China.
Pei, Ling-Ling; Li, Qin; Wang, Zheng-Xin
2018-03-08
The relationship between pollutant discharge and economic growth has been a major research focus in environmental economics. To accurately estimate the nonlinear change law of China's pollutant discharge with economic growth, this study establishes a transformed nonlinear grey multivariable (TNGM (1, N )) model based on the nonlinear least square (NLS) method. The Gauss-Seidel iterative algorithm was used to solve the parameters of the TNGM (1, N ) model based on the NLS basic principle. This algorithm improves the precision of the model by continuous iteration and constantly approximating the optimal regression coefficient of the nonlinear model. In our empirical analysis, the traditional grey multivariate model GM (1, N ) and the NLS-based TNGM (1, N ) models were respectively adopted to forecast and analyze the relationship among wastewater discharge per capita (WDPC), and per capita emissions of SO₂ and dust, alongside GDP per capita in China during the period 1996-2015. Results indicated that the NLS algorithm is able to effectively help the grey multivariable model identify the nonlinear relationship between pollutant discharge and economic growth. The results show that the NLS-based TNGM (1, N ) model presents greater precision when forecasting WDPC, SO₂ emissions and dust emissions per capita, compared to the traditional GM (1, N ) model; WDPC indicates a growing tendency aligned with the growth of GDP, while the per capita emissions of SO₂ and dust reduce accordingly.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
Method and Excel VBA Algorithm for Modeling Master Recession Curve Using Trigonometry Approach.
Posavec, Kristijan; Giacopetti, Marco; Materazzi, Marco; Birk, Steffen
2017-11-01
A new method was developed and implemented into an Excel Visual Basic for Applications (VBAs) algorithm utilizing trigonometry laws in an innovative way to overlap recession segments of time series and create master recession curves (MRCs). Based on a trigonometry approach, the algorithm horizontally translates succeeding recession segments of time series, placing their vertex, that is, the highest recorded value of each recession segment, directly onto the appropriate connection line defined by measurement points of a preceding recession segment. The new method and algorithm continues the development of methods and algorithms for the generation of MRC, where the first published method was based on a multiple linear/nonlinear regression model approach (Posavec et al. 2006). The newly developed trigonometry-based method was tested on real case study examples and compared with the previously published multiple linear/nonlinear regression model-based method. The results show that in some cases, that is, for some time series, the trigonometry-based method creates narrower overlaps of the recession segments, resulting in higher coefficients of determination R 2 , while in other cases the multiple linear/nonlinear regression model-based method remains superior. The Excel VBA algorithm for modeling MRC using the trigonometry approach is implemented into a spreadsheet tool (MRCTools v3.0 written by and available from Kristijan Posavec, Zagreb, Croatia) containing the previously published VBA algorithms for MRC generation and separation. All algorithms within the MRCTools v3.0 are open access and available free of charge, supporting the idea of running science on available, open, and free of charge software. © 2017, National Ground Water Association.
NASA Astrophysics Data System (ADS)
Bellugi, D. G.; Tennant, C.; Larsen, L.
2016-12-01
Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
A novel model incorporating two variability sources for describing motor evoked potentials
Goetz, Stefan M.; Luber, Bruce; Lisanby, Sarah H.; Peterchev, Angel V.
2014-01-01
Objective Motor evoked potentials (MEPs) play a pivotal role in transcranial magnetic stimulation (TMS), e.g., for determining the motor threshold and probing cortical excitability. Sampled across the range of stimulation strengths, MEPs outline an input–output (IO) curve, which is often used to characterize the corticospinal tract. More detailed understanding of the signal generation and variability of MEPs would provide insight into the underlying physiology and aid correct statistical treatment of MEP data. Methods A novel regression model is tested using measured IO data of twelve subjects. The model splits MEP variability into two independent contributions, acting on both sides of a strong sigmoidal nonlinearity that represents neural recruitment. Traditional sigmoidal regression with a single variability source after the nonlinearity is used for comparison. Results The distribution of MEP amplitudes varied across different stimulation strengths, violating statistical assumptions in traditional regression models. In contrast to the conventional regression model, the dual variability source model better described the IO characteristics including phenomena such as changing distribution spread and skewness along the IO curve. Conclusions MEP variability is best described by two sources that most likely separate variability in the initial excitation process from effects occurring later on. The new model enables more accurate and sensitive estimation of the IO curve characteristics, enhancing its power as a detection tool, and may apply to other brain stimulation modalities. Furthermore, it extracts new information from the IO data concerning the neural variability—information that has previously been treated as noise. PMID:24794287
Raj, Retheep; Sivanandan, K S
2017-01-01
Estimation of elbow dynamics has been the object of numerous investigations. In this work a solution is proposed for estimating elbow movement velocity and elbow joint angle from Surface Electromyography (SEMG) signals. Here the Surface Electromyography signals are acquired from the biceps brachii muscle of human hand. Two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), are extracted from the Surface Electromyography signal. The relationship between the time domain parameters, IEMG and ZC with elbow angular displacement and elbow angular velocity during extension and flexion of the elbow are studied. A multiple input-multiple output model is derived for identifying the kinematics of elbow. A Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural network (MLPNN) model is proposed for the estimation of elbow joint angle and elbow angular velocity. The proposed NARX MLPNN model is trained using Levenberg-marquardt based algorithm. The proposed model is estimating the elbow joint angle and elbow movement angular velocity with appreciable accuracy. The model is validated using regression coefficient value (R). The average regression coefficient value (R) obtained for elbow angular displacement prediction is 0.9641 and for the elbow anglular velocity prediction is 0.9347. The Nonlinear Auto Regressive with eXogenous inputs (NARX) structure based multiple layer perceptron neural networks (MLPNN) model can be used for the estimation of angular displacement and movement angular velocity of the elbow with good accuracy.
Nonlinear Simulation of the Tooth Enamel Spectrum for EPR Dosimetry
NASA Astrophysics Data System (ADS)
Kirillov, V. A.; Dubovsky, S. V.
2016-07-01
Software was developed where initial EPR spectra of tooth enamel were deconvoluted based on nonlinear simulation, line shapes and signal amplitudes in the model initial spectrum were calculated, the regression coefficient was evaluated, and individual spectra were summed. Software validation demonstrated that doses calculated using it agreed excellently with the applied radiation doses and the doses reconstructed by the method of additive doses.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Various approaches and tools exist to estimate local and regional PM2.5 impacts from a single emissions source, ranging from simple screening techniques to Gaussian based dispersion models and complex grid-based Eulerian photochemical transport models. These approache...
Modeling workplace bullying using catastrophe theory.
Escartin, J; Ceja, L; Navarro, J; Zapf, D
2013-10-01
Workplace bullying is defined as negative behaviors directed at organizational members or their work context that occur regularly and repeatedly over a period of time. Employees' perceptions of psychosocial safety climate, workplace bullying victimization, and workplace bullying perpetration were assessed within a sample of nearly 5,000 workers. Linear and nonlinear approaches were applied in order to model both continuous and sudden changes in workplace bullying. More specifically, the present study examines whether a nonlinear dynamical systems model (i.e., a cusp catastrophe model) is superior to the linear combination of variables for predicting the effect of psychosocial safety climate and workplace bullying victimization on workplace bullying perpetration. According to the AICc, and BIC indices, the linear regression model fits the data better than the cusp catastrophe model. The study concludes that some phenomena, especially unhealthy behaviors at work (like workplace bullying), may be better studied using linear approaches as opposed to nonlinear dynamical systems models. This can be explained through the healthy variability hypothesis, which argues that positive organizational behavior is likely to present nonlinear behavior, while a decrease in such variability may indicate the occurrence of negative behaviors at work.
Nonlinear GARCH model and 1 / f noise
NASA Astrophysics Data System (ADS)
Kononovicius, A.; Ruseckas, J.
2015-06-01
Auto-regressive conditionally heteroskedastic (ARCH) family models are still used, by practitioners in business and economic policy making, as a conditional volatility forecasting models. Furthermore ARCH models still are attracting an interest of the researchers. In this contribution we consider the well known GARCH(1,1) process and its nonlinear modifications, reminiscent of NGARCH model. We investigate the possibility to reproduce power law statistics, probability density function and power spectral density, using ARCH family models. For this purpose we derive stochastic differential equations from the GARCH processes in consideration. We find the obtained equations to be similar to a general class of stochastic differential equations known to reproduce power law statistics. We show that linear GARCH(1,1) process has power law distribution, but its power spectral density is Brownian noise-like. However, the nonlinear modifications exhibit both power law distribution and power spectral density of the 1 /fβ form, including 1 / f noise.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Nonlinear-regression groundwater flow modeling of a deep regional aquifer system
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-01-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System
NASA Astrophysics Data System (ADS)
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-12-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
Van Looy, Stijn; Verplancke, Thierry; Benoit, Dominique; Hoste, Eric; Van Maele, Georges; De Turck, Filip; Decruyenaere, Johan
2007-01-01
Tacrolimus is an important immunosuppressive drug for organ transplantation patients. It has a narrow therapeutic range, toxic side effects, and a blood concentration with wide intra- and interindividual variability. Hence, it is of the utmost importance to monitor tacrolimus blood concentration, thereby ensuring clinical effect and avoiding toxic side effects. Prediction models for tacrolimus blood concentration can improve clinical care by optimizing monitoring of these concentrations, especially in the initial phase after transplantation during intensive care unit (ICU) stay. This is the first study in the ICU in which support vector machines, as a new data modeling technique, are investigated and tested in their prediction capabilities of tacrolimus blood concentration. Linear support vector regression (SVR) and nonlinear radial basis function (RBF) SVR are compared with multiple linear regression (MLR). Tacrolimus blood concentrations, together with 35 other relevant variables from 50 liver transplantation patients, were extracted from our ICU database. This resulted in a dataset of 457 blood samples, on average between 9 and 10 samples per patient, finally resulting in a database of more than 16,000 data values. Nonlinear RBF SVR, linear SVR, and MLR were performed after selection of clinically relevant input variables and model parameters. Differences between observed and predicted tacrolimus blood concentrations were calculated. Prediction accuracy of the three methods was compared after fivefold cross-validation (Friedman test and Wilcoxon signed rank analysis). Linear SVR and nonlinear RBF SVR had mean absolute differences between observed and predicted tacrolimus blood concentrations of 2.31 ng/ml (standard deviation [SD] 2.47) and 2.38 ng/ml (SD 2.49), respectively. MLR had a mean absolute difference of 2.73 ng/ml (SD 3.79). The difference between linear SVR and MLR was statistically significant (p < 0.001). RBF SVR had the advantage of requiring only 2 input variables to perform this prediction in comparison to 15 and 16 variables needed by linear SVR and MLR, respectively. This is an indication of the superior prediction capability of nonlinear SVR. Prediction of tacrolimus blood concentration with linear and nonlinear SVR was excellent, and accuracy was superior in comparison with an MLR model.
Saucedo-Reyes, Daniela; Carrillo-Salazar, José A; Román-Padilla, Lizbeth; Saucedo-Veloz, Crescenciano; Reyes-Santamaría, María I; Ramírez-Gilly, Mariana; Tecante, Alberto
2018-03-01
High hydrostatic pressure inactivation kinetics of Escherichia coli ATCC 25922 and Salmonella enterica subsp. enterica serovar Typhimurium ATCC 14028 ( S. typhimurium) in a low acid mamey pulp at four pressure levels (300, 350, 400, and 450 MPa), different exposure times (0-8 min), and temperature of 25 ± 2℃ were obtained. Survival curves showed deviations from linearity in the form of a tail (upward concavity). The primary models tested were the Weibull model, the modified Gompertz equation, and the biphasic model. The Weibull model gave the best goodness of fit ( R 2 adj > 0.956, root mean square error < 0.290) in the modeling and the lowest Akaike information criterion value. Exponential-logistic and exponential decay models, and Bigelow-type and an empirical models for b'( P) and n( P) parameters, respectively, were tested as alternative secondary models. The process validation considered the two- and one-step nonlinear regressions for making predictions of the survival fraction; both regression types provided an adequate goodness of fit and the one-step nonlinear regression clearly reduced fitting errors. The best candidate model according to the Akaike theory information, with better accuracy and more reliable predictions was the Weibull model integrated by the exponential-logistic and exponential decay secondary models as a function of time and pressure (two-step procedure) or incorporated as one equation (one-step procedure). Both mathematical expressions were used to determine the t d parameter, where the desired reductions ( 5D) (considering d = 5 ( t 5 ) as the criterion of 5 Log 10 reduction (5 D)) in both microorganisms are attainable at 400 MPa for 5.487 ± 0.488 or 5.950 ± 0.329 min, respectively, for the one- or two-step nonlinear procedure.
Macrocell path loss prediction using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Usman, Abraham U.; Okereke, Okpo U.; Omizegba, Elijah E.
2014-04-01
The prediction of propagation loss is a practical non-linear function approximation problem which linear regression or auto-regression models are limited in their ability to handle. However, some computational Intelligence techniques such as artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFISs) have been shown to have great ability to handle non-linear function approximation and prediction problems. In this study, the multiple layer perceptron neural network (MLP-NN), radial basis function neural network (RBF-NN) and an ANFIS network were trained using actual signal strength measurement taken at certain suburban areas of Bauchi metropolis, Nigeria. The trained networks were then used to predict propagation losses at the stated areas under differing conditions. The predictions were compared with the prediction accuracy of the popular Hata model. It was observed that ANFIS model gave a better fit in all cases having higher R2 values in each case and on average is more robust than MLP and RBF models as it generalises better to a different data.
Equilibrium, kinetics and process design of acid yellow 132 adsorption onto red pine sawdust.
Can, Mustafa
2015-01-01
Linear and non-linear regression procedures have been applied to the Langmuir, Freundlich, Tempkin, Dubinin-Radushkevich, and Redlich-Peterson isotherms for adsorption of acid yellow 132 (AY132) dye onto red pine (Pinus resinosa) sawdust. The effects of parameters such as particle size, stirring rate, contact time, dye concentration, adsorption dose, pH, and temperature were investigated, and interaction was characterized by Fourier transform infrared spectroscopy and field emission scanning electron microscope. The non-linear method of the Langmuir isotherm equation was found to be the best fitting model to the equilibrium data. The maximum monolayer adsorption capacity was found as 79.5 mg/g. The calculated thermodynamic results suggested that AY132 adsorption onto red pine sawdust was an exothermic, physisorption, and spontaneous process. Kinetics was analyzed by four different kinetic equations using non-linear regression analysis. The pseudo-second-order equation provides the best fit with experimental data.
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
Kumar, K Vasanth
2006-10-11
Batch kinetic experiments were carried out for the sorption of methylene blue onto activated carbon. The experimental kinetics were fitted to the pseudo first-order and pseudo second-order kinetics by linear and a non-linear method. The five different types of Ho pseudo second-order expression have been discussed. A comparison of linear least-squares method and a trial and error non-linear method of estimating the pseudo second-order rate kinetic parameters were examined. The sorption process was found to follow a both pseudo first-order kinetic and pseudo second-order kinetic model. Present investigation showed that it is inappropriate to use a type 1 and type pseudo second-order expressions as proposed by Ho and Blanachard et al. respectively for predicting the kinetic rate constants and the initial sorption rate for the studied system. Three correct possible alternate linear expressions (type 2 to type 4) to better predict the initial sorption rate and kinetic rate constants for the studied system (methylene blue/activated carbon) was proposed. Linear method was found to check only the hypothesis instead of verifying the kinetic model. Non-linear regression method was found to be the more appropriate method to determine the rate kinetic parameters.
The NLS-Based Nonlinear Grey Multivariate Model for Forecasting Pollutant Emissions in China
Pei, Ling-Ling; Li, Qin
2018-01-01
The relationship between pollutant discharge and economic growth has been a major research focus in environmental economics. To accurately estimate the nonlinear change law of China’s pollutant discharge with economic growth, this study establishes a transformed nonlinear grey multivariable (TNGM (1, N)) model based on the nonlinear least square (NLS) method. The Gauss–Seidel iterative algorithm was used to solve the parameters of the TNGM (1, N) model based on the NLS basic principle. This algorithm improves the precision of the model by continuous iteration and constantly approximating the optimal regression coefficient of the nonlinear model. In our empirical analysis, the traditional grey multivariate model GM (1, N) and the NLS-based TNGM (1, N) models were respectively adopted to forecast and analyze the relationship among wastewater discharge per capita (WDPC), and per capita emissions of SO2 and dust, alongside GDP per capita in China during the period 1996–2015. Results indicated that the NLS algorithm is able to effectively help the grey multivariable model identify the nonlinear relationship between pollutant discharge and economic growth. The results show that the NLS-based TNGM (1, N) model presents greater precision when forecasting WDPC, SO2 emissions and dust emissions per capita, compared to the traditional GM (1, N) model; WDPC indicates a growing tendency aligned with the growth of GDP, while the per capita emissions of SO2 and dust reduce accordingly. PMID:29517985
Troutman, Brent M.
1982-01-01
Errors in runoff prediction caused by input data errors are analyzed by treating precipitation-runoff models as regression (conditional expectation) models. Independent variables of the regression consist of precipitation and other input measurements; the dependent variable is runoff. In models using erroneous input data, prediction errors are inflated and estimates of expected storm runoff for given observed input variables are biased. This bias in expected runoff estimation results in biased parameter estimates if these parameter estimates are obtained by a least squares fit of predicted to observed runoff values. The problems of error inflation and bias are examined in detail for a simple linear regression of runoff on rainfall and for a nonlinear U.S. Geological Survey precipitation-runoff model. Some implications for flood frequency analysis are considered. A case study using a set of data from Turtle Creek near Dallas, Texas illustrates the problems of model input errors.
Godin, Bruno; Mayer, Frédéric; Agneessens, Richard; Gerin, Patrick; Dardenne, Pierre; Delfosse, Philippe; Delcarte, Jérôme
2015-01-01
The reliability of different models to predict the biochemical methane potential (BMP) of various plant biomasses using a multispecies dataset was compared. The most reliable prediction models of the BMP were those based on the near infrared (NIR) spectrum compared to those based on the chemical composition. The NIR predictions of local (specific regression and non-linear) models were able to estimate quantitatively, rapidly, cheaply and easily the BMP. Such a model could be further used for biomethanation plant management and optimization. The predictions of non-linear models were more reliable compared to those of linear models. The presentation form (green-dried, silage-dried and silage-wet form) of biomasses to the NIR spectrometer did not influence the performances of the NIR prediction models. The accuracy of the BMP method should be improved to enhance further the BMP prediction models. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Koyuncu, A.; Cigeroglu, E.; Özgüven, H. N.
2017-10-01
In this study, a new approach is proposed for identification of structural nonlinearities by employing cascaded optimization and neural networks. Linear finite element model of the system and frequency response functions measured at arbitrary locations of the system are used in this approach. Using the finite element model, a training data set is created, which appropriately spans the possible nonlinear configurations space of the system. A classification neural network trained on these data sets then localizes and determines the types of all nonlinearities associated with the nonlinear degrees of freedom in the system. A new training data set spanning the parametric space associated with the determined nonlinearities is created to facilitate parametric identification. Utilizing this data set, initially, a feed forward regression neural network is trained, which parametrically identifies the classified nonlinearities. Then, the results obtained are further improved by carrying out an optimization which uses network identified values as starting points. Unlike identification methods available in literature, the proposed approach does not require data collection from the degrees of freedoms where nonlinear elements are attached, and furthermore, it is sufficiently accurate even in the presence of measurement noise. The application of the proposed approach is demonstrated on an example system with nonlinear elements and on a real life experimental setup with a local nonlinearity.
NASA Astrophysics Data System (ADS)
Chen, Yi-Ying; Chu, Chia-Ren; Li, Ming-Hsu
2012-10-01
SummaryIn this paper we present a semi-parametric multivariate gap-filling model for tower-based measurement of latent heat flux (LE). Two statistical techniques, the principal component analysis (PCA) and a nonlinear interpolation approach were integrated into this LE gap-filling model. The PCA was first used to resolve the multicollinearity relationships among various environmental variables, including radiation, soil moisture deficit, leaf area index, wind speed, etc. Two nonlinear interpolation methods, multiple regressions (MRS) and the K-nearest neighbors (KNNs) were examined with random selected flux gaps for both clear sky and nighttime/cloudy data to incorporate into this LE gap-filling model. Experimental results indicated that the KNN interpolation approach is able to provide consistent LE estimations while MRS presents over estimations during nighttime/cloudy. Rather than using empirical regression parameters, the KNN approach resolves the nonlinear relationship between the gap-filled LE flux and principal components with adaptive K values under different atmospheric states. The developed LE gap-filling model (PCA with KNN) works with a RMSE of 2.4 W m-2 (˜0.09 mm day-1) at a weekly time scale by adding 40% artificial flux gaps into original dataset. Annual evapotranspiration at this study site were estimated at 736 mm (1803 MJ) and 728 mm (1785 MJ) for year 2008 and 2009, respectively.
Geszke-Moritz, Małgorzata; Moritz, Michał
2016-12-01
The present study deals with the adsorption of boldine onto pure and propyl-sulfonic acid-functionalized SBA-15, SBA-16 and mesocellular foam (MCF) materials. Siliceous adsorbents were characterized by nitrogen sorption analysis, transmission electron microscopy (TEM), scanning electron microscopy (SEM), Fourier-transform infrared (FT-IR) spectroscopy and thermogravimetric analysis. The equilibrium adsorption data were analyzed using the Langmuir, Freundlich, Redlich-Peterson, and Temkin isotherms. Moreover, the Dubinin-Radushkevich and Dubinin-Astakhov isotherm models based on the Polanyi adsorption potential were employed. The latter was calculated using two alternative formulas including solubility-normalized (S-model) and empirical C-model. In order to find the best-fit isotherm, both linear regression and nonlinear fitting analysis were carried out. The Dubinin-Astakhov (S-model) isotherm revealed the best fit to the experimental points for adsorption of boldine onto pure mesoporous materials using both linear and nonlinear fitting analysis. Meanwhile, the process of boldine sorption onto modified silicas was described the best by the Langmuir and Temkin isotherms using linear regression and nonlinear fitting analysis, respectively. The values of adsorption energy (below 8kJ/mol) indicate the physical nature of boldine adsorption onto unmodified silicas whereas the ionic interactions seem to be the main force of alkaloid adsorption onto functionalized sorbents (energy of adsorption above 8kJ/mol). Copyright © 2016 Elsevier B.V. All rights reserved.
Photonic single nonlinear-delay dynamical node for information processing
NASA Astrophysics Data System (ADS)
Ortín, Silvia; San-Martín, Daniel; Pesquera, Luis; Gutiérrez, José Manuel
2012-06-01
An electro-optical system with a delay loop based on semiconductor lasers is investigated for information processing by performing numerical simulations. This system can replace a complex network of many nonlinear elements for the implementation of Reservoir Computing. We show that a single nonlinear-delay dynamical system has the basic properties to perform as reservoir: short-term memory and separation property. The computing performance of this system is evaluated for two prediction tasks: Lorenz chaotic time series and nonlinear auto-regressive moving average (NARMA) model. We sweep the parameters of the system to find the best performance. The results achieved for the Lorenz and the NARMA-10 tasks are comparable to those obtained by other machine learning methods.
Numerical simulations for tumor and cellular immune system interactions in lung cancer treatment
NASA Astrophysics Data System (ADS)
Kolev, M.; Nawrocki, S.; Zubik-Kowal, B.
2013-06-01
We investigate a new mathematical model that describes lung cancer regression in patients treated by chemotherapy and radiotherapy. The model is composed of nonlinear integro-differential equations derived from the so-called kinetic theory for active particles and a new sink function is investigated according to clinical data from carcinoma planoepitheliale. The model equations are solved numerically and the data are utilized in order to find their unknown parameters. The results of the numerical experiments show a good correlation between the predicted and clinical data and illustrate that the mathematical model has potential to describe lung cancer regression.
A stepwise model to predict monthly streamflow
NASA Astrophysics Data System (ADS)
Mahmood Al-Juboori, Anas; Guven, Aytac
2016-12-01
In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
NASA Astrophysics Data System (ADS)
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.
We compared two regression models, which are based on the Weibull and probit functions, for the analysis of pesticide toxicity data from laboratory studies on Illinois crop and native plant species. Both mathematical models are continuous, differentiable, strictly positive, and...
Individual tree growth models for natural even-aged shortleaf pine
Chakra B. Budhathoki; Thomas B. Lynch; James M. Guldin
2006-01-01
Shortleaf pine (Pinus echinata Mill.) measurements were available from permanent plots established in even-aged stands of the Ouachita Mountains for studying growth. Annual basal area growth was modeled with a least-squares nonlinear regression method utilizing three measurements. The analysis showed that the parameter estimates were in agreement...
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.
Duarte, Belmiro P M; Wong, Weng Kee
2015-08-01
This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach
Duarte, Belmiro P. M.; Wong, Weng Kee
2014-01-01
Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159
Assessing the Impact of Drug Use on Hospital Costs
Stuart, Bruce C; Doshi, Jalpa A; Terza, Joseph V
2009-01-01
Objective To assess whether outpatient prescription drug utilization produces offsets in the cost of hospitalization for Medicare beneficiaries. Data Sources/Study Setting The study analyzed a sample (N=3,101) of community-dwelling fee-for-service U.S. Medicare beneficiaries drawn from the 1999 and 2000 Medicare Current Beneficiary Surveys. Study Design Using a two-part model specification, we regressed any hospital admission (part 1: probit) and hospital spending by those with one or more admissions (part 2: nonlinear least squares regression) on drug use in a standard model with strong covariate controls and a residual inclusion instrumental variable (IV) model using an exogenous measure of drug coverage as the instrument. Principal Findings The covariate control model predicted that each additional prescription drug used (mean=30) raised hospital spending by $16 (p<.001). The residual inclusion IV model prediction was that each additional prescription fill reduced hospital spending by $104 (p<.001). Conclusions The findings indicate that drug use is associated with cost offsets in hospitalization among Medicare beneficiaries, once omitted variable bias is corrected using an IV technique appropriate for nonlinear applications. PMID:18783453
Yu, Lijing; Zhou, Lingling; Tan, Li; Jiang, Hongbo; Wang, Ying; Wei, Sheng; Nie, Shaofa
2014-01-01
Outbreaks of hand-foot-mouth disease (HFMD) have been reported for many times in Asia during the last decades. This emerging disease has drawn worldwide attention and vigilance. Nowadays, the prevention and control of HFMD has become an imperative issue in China. Early detection and response will be helpful before it happening, using modern information technology during the epidemic. In this paper, a hybrid model combining seasonal auto-regressive integrated moving average (ARIMA) model and nonlinear auto-regressive neural network (NARNN) is proposed to predict the expected incidence cases from December 2012 to May 2013, using the retrospective observations obtained from China Information System for Disease Control and Prevention from January 2008 to November 2012. The best-fitted hybrid model was combined with seasonal ARIMA [Formula: see text] and NARNN with 15 hidden units and 5 delays. The hybrid model makes the good forecasting performance and estimates the expected incidence cases from December 2012 to May 2013, which are respectively -965.03, -1879.58, 4138.26, 1858.17, 4061.86 and 6163.16 with an obviously increasing trend. The model proposed in this paper can predict the incidence trend of HFMD effectively, which could be helpful to policy makers. The usefulness of expected cases of HFMD perform not only in detecting outbreaks or providing probability statements, but also in providing decision makers with a probable trend of the variability of future observations that contains both historical and recent information.
Experimental validation of a coupled neutron-photon inverse radiation transport solver
NASA Astrophysics Data System (ADS)
Mattingly, John; Mitchell, Dean J.; Harding, Lee T.
2011-10-01
Sandia National Laboratories has developed an inverse radiation transport solver that applies nonlinear regression to coupled neutron-photon deterministic transport models. The inverse solver uses nonlinear regression to fit a radiation transport model to gamma spectrometry and neutron multiplicity counting measurements. The subject of this paper is the experimental validation of that solver. This paper describes a series of experiments conducted with a 4.5 kg sphere of α-phase, weapons-grade plutonium. The source was measured bare and reflected by high-density polyethylene (HDPE) spherical shells with total thicknesses between 1.27 and 15.24 cm. Neutron and photon emissions from the source were measured using three instruments: a gross neutron counter, a portable neutron multiplicity counter, and a high-resolution gamma spectrometer. These measurements were used as input to the inverse radiation transport solver to evaluate the solver's ability to correctly infer the configuration of the source from its measured radiation signatures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh
2005-12-16
Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k {sub i} values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, themore » previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.« less
Goeyvaerts, Nele; Leuridan, Elke; Faes, Christel; Van Damme, Pierre; Hens, Niel
2015-09-10
Biomedical studies often generate repeated measures of multiple outcomes on a set of subjects. It may be of interest to develop a biologically intuitive model for the joint evolution of these outcomes while assessing inter-subject heterogeneity. Even though it is common for biological processes to entail non-linear relationships, examples of multivariate non-linear mixed models (MNMMs) are still fairly rare. We contribute to this area by jointly analyzing the maternal antibody decay for measles, mumps, rubella, and varicella, allowing for a different non-linear decay model for each infectious disease. We present a general modeling framework to analyze multivariate non-linear longitudinal profiles subject to censoring, by combining multivariate random effects, non-linear growth and Tobit regression. We explore the hypothesis of a common infant-specific mechanism underlying maternal immunity using a pairwise correlated random-effects approach and evaluating different correlation matrix structures. The implied marginal correlation between maternal antibody levels is estimated using simulations. The mean duration of passive immunity was less than 4 months for all diseases with substantial heterogeneity between infants. The maternal antibody levels against rubella and varicella were found to be positively correlated, while little to no correlation could be inferred for the other disease pairs. For some pairs, computational issues occurred with increasing correlation matrix complexity, which underlines the importance of further developing estimation methods for MNMMs. Copyright © 2015 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Southworth, Frank; Garrow, Dr. Laurie
This chapter describes the principal types of both passenger and freight demand models in use today, providing a brief history of model development supported by references to a number of popular texts on the subject, and directing the reader to papers covering some of the more recent technical developments in the area. Over the past half century a variety of methods have been used to estimate and forecast travel demands, drawing concepts from economic/utility maximization theory, transportation system optimization and spatial interaction theory, using and often combining solution techniques as varied as Box-Jenkins methods, non-linear multivariate regression, non-linear mathematical programming,more » and agent-based microsimulation.« less
Applications of Monte Carlo method to nonlinear regression of rheological data
NASA Astrophysics Data System (ADS)
Kim, Sangmo; Lee, Junghaeng; Kim, Sihyun; Cho, Kwang Soo
2018-02-01
In rheological study, it is often to determine the parameters of rheological models from experimental data. Since both rheological data and values of the parameters vary in logarithmic scale and the number of the parameters is quite large, conventional method of nonlinear regression such as Levenberg-Marquardt (LM) method is usually ineffective. The gradient-based method such as LM is apt to be caught in local minima which give unphysical values of the parameters whenever the initial guess of the parameters is far from the global optimum. Although this problem could be solved by simulated annealing (SA), the Monte Carlo (MC) method needs adjustable parameter which could be determined in ad hoc manner. We suggest a simplified version of SA, a kind of MC methods which results in effective values of the parameters of most complicated rheological models such as the Carreau-Yasuda model of steady shear viscosity, discrete relaxation spectrum and zero-shear viscosity as a function of concentration and molecular weight.
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis
Gong, Xiajing; Hu, Meng
2018-01-01
Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640
Two-Stage Residual Inclusion Estimation in Health Services Research and Health Economics.
Terza, Joseph V
2018-06-01
Empirical analyses in health services research and health economics often require implementation of nonlinear models whose regressors include one or more endogenous variables-regressors that are correlated with the unobserved random component of the model. In such cases, implementation of conventional regression methods that ignore endogeneity will likely produce results that are biased and not causally interpretable. Terza et al. (2008) discuss a relatively simple estimation method that avoids endogeneity bias and is applicable in a wide variety of nonlinear regression contexts. They call this method two-stage residual inclusion (2SRI). In the present paper, I offer a 2SRI how-to guide for practitioners and a step-by-step protocol that can be implemented with any of the popular statistical or econometric software packages. We introduce the protocol and its Stata implementation in the context of a real data example. Implementation of 2SRI for a very broad class of nonlinear models is then discussed. Additional examples are given. We analyze cigarette smoking as a determinant of infant birthweight using data from Mullahy (1997). It is hoped that the discussion will serve as a practical guide to implementation of the 2SRI protocol for applied researchers. © Health Research and Educational Trust.
Zhao, Rui; Catalano, Paul; DeGruttola, Victor G.; Michor, Franziska
2017-01-01
The dynamics of tumor burden, secreted proteins or other biomarkers over time, is often used to evaluate the effectiveness of therapy and to predict outcomes for patients. Many methods have been proposed to investigate longitudinal trends to better characterize patients and to understand disease progression. However, most approaches assume a homogeneous patient population and a uniform response trajectory over time and across patients. Here, we present a mixture piecewise linear Bayesian hierarchical model, which takes into account both population heterogeneity and nonlinear relationships between biomarkers and time. Simulation results show that our method was able to classify subjects according to their patterns of treatment response with greater than 80% accuracy in the three scenarios tested. We then applied our model to a large randomized controlled phase III clinical trial of multiple myeloma patients. Analysis results suggest that the longitudinal tumor burden trajectories in multiple myeloma patients are heterogeneous and nonlinear, even among patients assigned to the same treatment cohort. In addition, between cohorts, there are distinct differences in terms of the regression parameters and the distributions among categories in the mixture. Those results imply that longitudinal data from clinical trials may harbor unobserved subgroups and nonlinear relationships; accounting for both may be important for analyzing longitudinal data. PMID:28723910
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Cao, Hui; Li, Yao-Jiang; Zhou, Yan; Wang, Yan-Xia
2014-11-01
To deal with nonlinear characteristics of spectra data for the thermal power plant flue, a nonlinear partial least square (PLS) analysis method with internal model based on neural network is adopted in the paper. The latent variables of the independent variables and the dependent variables are extracted by PLS regression firstly, and then they are used as the inputs and outputs of neural network respectively to build the nonlinear internal model by train process. For spectra data of flue gases of the thermal power plant, PLS, the nonlinear PLS with the internal model of back propagation neural network (BP-NPLS), the non-linear PLS with the internal model of radial basis function neural network (RBF-NPLS) and the nonlinear PLS with the internal model of adaptive fuzzy inference system (ANFIS-NPLS) are compared. The root mean square error of prediction (RMSEP) of sulfur dioxide of BP-NPLS, RBF-NPLS and ANFIS-NPLS are reduced by 16.96%, 16.60% and 19.55% than that of PLS, respectively. The RMSEP of nitric oxide of BP-NPLS, RBF-NPLS and ANFIS-NPLS are reduced by 8.60%, 8.47% and 10.09% than that of PLS, respectively. The RMSEP of nitrogen dioxide of BP-NPLS, RBF-NPLS and ANFIS-NPLS are reduced by 2.11%, 3.91% and 3.97% than that of PLS, respectively. Experimental results show that the nonlinear PLS is more suitable for the quantitative analysis of glue gas than PLS. Moreover, by using neural network function which can realize high approximation of nonlinear characteristics, the nonlinear partial least squares method with internal model mentioned in this paper have well predictive capabilities and robustness, and could deal with the limitations of nonlinear partial least squares method with other internal model such as polynomial and spline functions themselves under a certain extent. ANFIS-NPLS has the best performance with the internal model of adaptive fuzzy inference system having ability to learn more and reduce the residuals effectively. Hence, ANFIS-NPLS is an accurate and useful quantitative thermal power plant flue gas analysis method.
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming
2016-01-01
Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.
A LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR (LASSO) FOR NONLINEAR SYSTEM IDENTIFICATION
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.; Lofberg, Johan; Brenner, Martin J.
2006-01-01
Identification of parametric nonlinear models involves estimating unknown parameters and detecting its underlying structure. Structure computation is concerned with selecting a subset of parameters to give a parsimonious description of the system which may afford greater insight into the functionality of the system or a simpler controller design. In this study, a least absolute shrinkage and selection operator (LASSO) technique is investigated for computing efficient model descriptions of nonlinear systems. The LASSO minimises the residual sum of squares by the addition of a 1 penalty term on the parameter vector of the traditional 2 minimisation problem. Its use for structure detection is a natural extension of this constrained minimisation approach to pseudolinear regression problems which produces some model parameters that are exactly zero and, therefore, yields a parsimonious system description. The performance of this LASSO structure detection method was evaluated by using it to estimate the structure of a nonlinear polynomial model. Applicability of the method to more complex systems such as those encountered in aerospace applications was shown by identifying a parsimonious system description of the F/A-18 Active Aeroelastic Wing using flight test data.
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.
Estimation of parameters of constant elasticity of substitution production functional model
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi
2017-11-01
Nonlinear model building has become an increasing important powerful tool in mathematical economics. In recent years the popularity of applications of nonlinear models has dramatically been rising up. Several researchers in econometrics are very often interested in the inferential aspects of nonlinear regression models [6]. The present research study gives a distinct method of estimation of more complicated and highly nonlinear model viz Constant Elasticity of Substitution (CES) production functional model. Henningen et.al [5] proposed three solutions to avoid serious problems when estimating CES functions in 2012 and they are i) removing discontinuities by using the limits of the CES function and its derivative. ii) Circumventing large rounding errors by local linear approximations iii) Handling ill-behaved objective functions by a multi-dimensional grid search. Joel Chongeh et.al [7] discussed the estimation of the impact of capital and labour inputs to the gris output agri-food products using constant elasticity of substitution production function in Tanzanian context. Pol Antras [8] presented new estimates of the elasticity of substitution between capital and labour using data from the private sector of the U.S. economy for the period 1948-1998.
Does money matter in inflation forecasting?
NASA Astrophysics Data System (ADS)
Binner, J. M.; Tino, P.; Tepper, J.; Anderson, R.; Jones, B.; Kendall, G.
2010-11-01
This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. In our forecasting experiment we use two nonlinear techniques, namely, recurrent neural networks and kernel recursive least squares regression-techniques that are new to macroeconomics. Recurrent neural networks operate with potentially unbounded input memory, while the kernel regression technique is a finite memory predictor. The two methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a naïve random walk model. The best models were nonlinear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation. Beyond its economic findings, our study is in the tradition of physicists’ long-standing interest in the interconnections among statistical mechanics, neural networks, and related nonparametric statistical methods, and suggests potential avenues of extension for such studies.
Mizutani, Eiji; Demmel, James W
2003-01-01
This paper briefly introduces our numerical linear algebra approaches for solving structured nonlinear least squares problems arising from 'multiple-output' neural-network (NN) models. Our algorithms feature trust-region regularization, and exploit sparsity of either the 'block-angular' residual Jacobian matrix or the 'block-arrow' Gauss-Newton Hessian (or Fisher information matrix in statistical sense) depending on problem scale so as to render a large class of NN-learning algorithms 'efficient' in both memory and operation costs. Using a relatively large real-world nonlinear regression application, we shall explain algorithmic strengths and weaknesses, analyzing simulation results obtained by both direct and iterative trust-region algorithms with two distinct NN models: 'multilayer perceptrons' (MLP) and 'complementary mixtures of MLP-experts' (or neuro-fuzzy modular networks).
Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali
2013-04-01
The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.
A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty
Friedel, Michael J.
2011-01-01
This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.
Methods for scalar-on-function regression.
Reiss, Philip T; Goldsmith, Jeff; Shang, Han Lin; Ogden, R Todd
2017-08-01
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
Bootstrap evaluation of a young Douglas-fir height growth model for the Pacific Northwest
Nicholas R. Vaughn; Eric C. Turnblom; Martin W. Ritchie
2010-01-01
We evaluated the stability of a complex regression model developed to predict the annual height growth of young Douglas-fir. This model is highly nonlinear and is fit in an iterative manner for annual growth coefficients from data with multiple periodic remeasurement intervals. The traditional methods for such a sensitivity analysis either involve laborious math or...
Li, Dan; Wang, Xia; Dey, Dipak K
2016-09-01
Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Wallis, Thomas S. A.; Dorr, Michael; Bex, Peter J.
2015-01-01
Sensitivity to luminance contrast is a prerequisite for all but the simplest visual systems. To examine contrast increment detection performance in a way that approximates the natural environmental input of the human visual system, we presented contrast increments gaze-contingently within naturalistic video freely viewed by observers. A band-limited contrast increment was applied to a local region of the video relative to the observer's current gaze point, and the observer made a forced-choice response to the location of the target (≈25,000 trials across five observers). We present exploratory analyses showing that performance improved as a function of the magnitude of the increment and depended on the direction of eye movements relative to the target location, the timing of eye movements relative to target presentation, and the spatiotemporal image structure at the target location. Contrast discrimination performance can be modeled by assuming that the underlying contrast response is an accelerating nonlinearity (arising from a nonlinear transducer or gain control). We implemented one such model and examined the posterior over model parameters, estimated using Markov-chain Monte Carlo methods. The parameters were poorly constrained by our data; parameters constrained using strong priors taken from previous research showed poor cross-validated prediction performance. Atheoretical logistic regression models were better constrained and provided similar prediction performance to the nonlinear transducer model. Finally, we explored the properties of an extended logistic regression that incorporates both eye movement and image content features. Models of contrast transduction may be better constrained by incorporating data from both artificial and natural contrast perception settings. PMID:26057546
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Thermoelastic steam turbine rotor control based on neural network
NASA Astrophysics Data System (ADS)
Rzadkowski, Romuald; Dominiczak, Krzysztof; Radulski, Wojciech; Szczepanik, R.
2015-12-01
Considered here are Nonlinear Auto-Regressive neural networks with eXogenous inputs (NARX) as a mathematical model of a steam turbine rotor for controlling steam turbine stress on-line. In order to obtain neural networks that locate critical stress and temperature points in the steam turbine during transient states, an FE rotor model was built. This model was used to train the neural networks on the basis of steam turbine transient operating data. The training included nonlinearity related to steam turbine expansion, heat exchange and rotor material properties during transients. Simultaneous neural networks are algorithms which can be implemented on PLC controllers. This allows for the application neural networks to control steam turbine stress in industrial power plants.
Nonlinear models for estimating GSFC travel requirements
NASA Technical Reports Server (NTRS)
Buffalano, C.; Hagan, F. J.
1974-01-01
A methodology is presented for estimating travel requirements for a particular period of time. Travel models were generated using nonlinear regression analysis techniques on a data base of FY-72 and FY-73 information from 79 GSFC projects. Although the subject matter relates to GSFX activities, the type of analysis used and the manner of selecting the relevant variables would be of interest to other NASA centers, government agencies, private corporations and, in general, any organization with a significant travel budget. Models were developed for each of six types of activity: flight projects (in-house and out-of-house), experiments on non-GSFC projects, international projects, ART/SRT, data analysis, advanced studies, tracking and data, and indirects.
Bilgili, Mehmet; Sahin, Besir; Sangun, Levent
2013-01-01
The aim of this study is to estimate the soil temperatures of a target station using only the soil temperatures of neighboring stations without any consideration of the other variables or parameters related to soil properties. For this aim, the soil temperatures were measured at depths of 5, 10, 20, 50, and 100 cm below the earth surface at eight measuring stations in Turkey. Firstly, the multiple nonlinear regression analysis was performed with the "Enter" method to determine the relationship between the values of target station and neighboring stations. Then, the stepwise regression analysis was applied to determine the best independent variables. Finally, an artificial neural network (ANN) model was developed to estimate the soil temperature of a target station. According to the derived results for the training data set, the mean absolute percentage error and correlation coefficient ranged from 1.45% to 3.11% and from 0.9979 to 0.9986, respectively, while corresponding ranges of 1.685-3.65% and 0.9988-0.9991, respectively, were obtained based on the testing data set. The obtained results show that the developed ANN model provides a simple and accurate prediction to determine the soil temperature. In addition, the missing data at the target station could be determined within a high degree of accuracy.
Determination of nonlinear genetic architecture using compressed sensing.
Ho, Chiu Man; Hsu, Stephen D H
2015-01-01
One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix. The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application. Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h (2)∼0.5), can be extracted from data sets comprised of n ⋆∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method.
Remontet, L; Bossard, N; Belot, A; Estève, J
2007-05-10
Relative survival provides a measure of the proportion of patients dying from the disease under study without requiring the knowledge of the cause of death. We propose an overall strategy based on regression models to estimate the relative survival and model the effects of potential prognostic factors. The baseline hazard was modelled until 10 years follow-up using parametric continuous functions. Six models including cubic regression splines were considered and the Akaike Information Criterion was used to select the final model. This approach yielded smooth and reliable estimates of mortality hazard and allowed us to deal with sparse data taking into account all the available information. Splines were also used to model simultaneously non-linear effects of continuous covariates and time-dependent hazard ratios. This led to a graphical representation of the hazard ratio that can be useful for clinical interpretation. Estimates of these models were obtained by likelihood maximization. We showed that these estimates could be also obtained using standard algorithms for Poisson regression. Copyright 2006 John Wiley & Sons, Ltd.
Non-linear auto-regressive models for cross-frequency coupling in neural time series
Tallot, Lucille; Grabot, Laetitia; Doyère, Valérie; Grenier, Yves; Gramfort, Alexandre
2017-01-01
We address the issue of reliably detecting and quantifying cross-frequency coupling (CFC) in neural time series. Based on non-linear auto-regressive models, the proposed method provides a generative and parametric model of the time-varying spectral content of the signals. As this method models the entire spectrum simultaneously, it avoids the pitfalls related to incorrect filtering or the use of the Hilbert transform on wide-band signals. As the model is probabilistic, it also provides a score of the model “goodness of fit” via the likelihood, enabling easy and legitimate model selection and parameter comparison; this data-driven feature is unique to our model-based approach. Using three datasets obtained with invasive neurophysiological recordings in humans and rodents, we demonstrate that these models are able to replicate previous results obtained with other metrics, but also reveal new insights such as the influence of the amplitude of the slow oscillation. Using simulations, we demonstrate that our parametric method can reveal neural couplings with shorter signals than non-parametric methods. We also show how the likelihood can be used to find optimal filtering parameters, suggesting new properties on the spectrum of the driving signal, but also to estimate the optimal delay between the coupled signals, enabling a directionality estimation in the coupling. PMID:29227989
Agogo, George O.; van der Voet, Hilko; Veer, Pieter van’t; Ferrari, Pietro; Leenders, Max; Muller, David C.; Sánchez-Cantalejo, Emilio; Bamia, Christina; Braaten, Tonje; Knüppel, Sven; Johansson, Ingegerd; van Eeuwijk, Fred A.; Boshuizen, Hendriek
2014-01-01
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model. PMID:25402487
IN11B-1621: Quantifying How Climate Affects Vegetation in the Amazon Rainforest
NASA Technical Reports Server (NTRS)
Das, Kamalika; Kodali, Anuradha; Szubert, Marcin; Ganguly, Sangram; Bongard, Joshua
2016-01-01
Amazon droughts in 2005 and 2010 have raised serious concern about the future of the rainforest. Amazon forests are crucial because of their role as the largest carbon sink in the world which would effect the global warming phenomena with decreased photosynthesis activity. Especially, after a decline in plant growth in 1.68 million km2 forest area during the once-in-a-century severe drought in 2010, it is of primary importance to understand the relationship between different climatic variables and vegetation. In an earlier study, we have shown that non-linear models are better at capturing the relation dynamics of vegetation and climate variables such as temperature and precipitation, compared to linear models. In this research, we learn precise models between vegetation and climatic variables (temperature, precipitation) for normal conditions in the Amazon region using genetic programming based symbolic regression. This is done by removing high elevation and drought affected areas and also considering the slope of the region as one of the important factors while building the model. The model learned reveals new and interesting ways historical and current climate variables affect the vegetation at any location. MAIAC data has been used as a vegetation surrogate in our study. For temperature and precipitation, we have used TRMM and MODIS Land Surface Temperature data sets while learning the non-linear regression model. However, to generalize the model to make it independent of the data source, we perform transfer learning where we regress a regularized least squares to learn the parameters of the non-linear model using other data sources such as the precipitation and temperature from the Climatic Research Center (CRU). This new model is very similar in structure and performance compared to the original learned model and verifies the same claims about the nature of dependency between these climate variables and the vegetation in the Amazon region. As a result of this study, we are able to learn, for the very first time how exactly different climate factors influence vegetation at any location in the Amazon rainforests, independent of the specific sources from which the data has been obtained.
Quantifying How Climate Affects Vegetation in the Amazon Rainforest
NASA Astrophysics Data System (ADS)
Das, K.; Kodali, A.; Szubert, M.; Ganguly, S.; Bongard, J.
2016-12-01
Amazon droughts in 2005 and 2010 have raised serious concern about the future of the rainforest. Amazon forests are crucial because of their role as the largest carbon sink in the world which would effect the global warming phenomena with decreased photosynthesis activity. Especially, after a decline in plant growth in 1.68 million km2 forest area during the once-in-a-century severe drought in 2010, it is of primary importance to understand the relationship between different climatic variables and vegetation. In an earlier study, we have shown that non-linear models are better at capturing the relation dynamics of vegetation and climate variables such as temperature and precipitation, compared to linear models. In this research, we learn precise models between vegetation and climatic variables (temperature, precipitation) for normal conditions in the Amazon region using genetic programming based symbolic regression. This is done by removing high elevation and drought affected areas and also considering the slope of the region as one of the important factors while building the model. The model learned reveals new and interesting ways historical and current climate variables affect the vegetation at any location. MAIAC data has been used as a vegetation surrogate in our study. For temperature and precipitation, we have used TRMM and MODIS Land Surface Temperature data sets while learning the non-linear regression model. However, to generalize the model to make it independent of the data source, we perform transfer learning where we regress a regularized least squares to learn the parameters of the non-linear model using other data sources such as the precipitation and temperature from the Climatic Research Center (CRU). This new model is very similar in structure and performance compared to the original learned model and verifies the same claims about the nature of dependency between these climate variables and the vegetation in the Amazon region. As a result of this study, we are able to learn, for the very first time how exactly different climate factors influence vegetation at any location in the Amazon rainforests, independent of the specific sources from which the data has been obtained.
Experiences from the testing of a theory for modelling groundwater flow in heterogeneous media
Christensen, S.; Cooley, R.L.
2002-01-01
Usually, small-scale model error is present in groundwater modelling because the model only represents average system characteristics having the same form as the drift and small-scale variability is neglected. These errors cause the true errors of a regression model to be correlated. Theory and an example show that the errors also contribute to bias in the estimates of model parameters. This bias originates from model nonlinearity. In spite of this bias, predictions of hydraulic head are nearly unbiased if the model intrinsic nonlinearity is small. Individual confidence and prediction intervals are accurate if the t-statistic is multiplied by a correction factor. The correction factor can be computed from the true error second moment matrix, which can be determined when the stochastic properties of the system characteristics are known.
Experience gained in testing a theory for modelling groundwater flow in heterogeneous media
Christensen, S.; Cooley, R.L.
2002-01-01
Usually, small-scale model error is present in groundwater modelling because the model only represents average system characteristics having the same form as the drift, and small-scale variability is neglected. These errors cause the true errors of a regression model to be correlated. Theory and an example show that the errors also contribute to bias in the estimates of model parameters. This bias originates from model nonlinearity. In spite of this bias, predictions of hydraulic head are nearly unbiased if the model intrinsic nonlinearity is small. Individual confidence and prediction intervals are accurate if the t-statistic is multiplied by a correction factor. The correction factor can be computed from the true error second moment matrix, which can be determined when the stochastic properties of the system characteristics are known.
DOT National Transportation Integrated Search
1999-11-01
Using a fairly large cross-section/time-series data base, covering all provinces of Norway and all months between January 1973 and December 1994, we estimate non-linear (Box-Cox) regression equations explaining aggregate car ownership, road use, seat...
Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.
Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T
2016-12-01
We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Advances in nowcasting influenza-like illness rates using search query logs
NASA Astrophysics Data System (ADS)
Lampos, Vasileios; Miller, Andrew C.; Crossan, Steve; Stefansen, Christian
2015-08-01
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Advances in nowcasting influenza-like illness rates using search query logs.
Lampos, Vasileios; Miller, Andrew C; Crossan, Steve; Stefansen, Christian
2015-08-03
User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012-13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance.
Omnibus risk assessment via accelerated failure time kernel machine modeling.
Sinnott, Jennifer A; Cai, Tianxi
2013-12-01
Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. © 2013, The International Biometric Society.
Predicting musically induced emotions from physiological inputs: linear and neural network models.
Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M
2013-01-01
Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
UCODE, a computer code for universal inverse modeling
Poeter, E.P.; Hill, M.C.
1999-01-01
This article presents the US Geological Survey computer program UCODE, which was developed in collaboration with the US Army Corps of Engineers Waterways Experiment Station and the International Ground Water Modeling Center of the Colorado School of Mines. UCODE performs inverse modeling, posed as a parameter-estimation problem, using nonlinear regression. Any application model or set of models can be used; the only requirement is that they have numerical (ASCII or text only) input and output files and that the numbers in these files have sufficient significant digits. Application models can include preprocessors and postprocessors as well as models related to the processes of interest (physical, chemical and so on), making UCODE extremely powerful for model calibration. Estimated parameters can be defined flexibly with user-specified functions. Observations to be matched in the regression can be any quantity for which a simulated equivalent value can be produced, thus simulated equivalent values are calculated using values that appear in the application model output files and can be manipulated with additive and multiplicative functions, if necessary. Prior, or direct, information on estimated parameters also can be included in the regression. The nonlinear regression problem is solved by minimizing a weighted least-squares objective function with respect to the parameter values using a modified Gauss-Newton method. Sensitivities needed for the method are calculated approximately by forward or central differences and problems and solutions related to this approximation are discussed. Statistics are calculated and printed for use in (1) diagnosing inadequate data or identifying parameters that probably cannot be estimated with the available data, (2) evaluating estimated parameter values, (3) evaluating the model representation of the actual processes and (4) quantifying the uncertainty of model simulated values. UCODE is intended for use on any computer operating system: it consists of algorithms programmed in perl, a freeware language designed for text manipulation and Fortran90, which efficiently performs numerical calculations.
Robust and efficient estimation with weighted composite quantile regression
NASA Astrophysics Data System (ADS)
Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng
2016-09-01
In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.
Univariate Time Series Prediction of Solar Power Using a Hybrid Wavelet-ARMA-NARX Prediction Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nazaripouya, Hamidreza; Wang, Yubo; Chu, Chi-Cheng
This paper proposes a new hybrid method for super short-term solar power prediction. Solar output power usually has a complex, nonstationary, and nonlinear characteristic due to intermittent and time varying behavior of solar radiance. In addition, solar power dynamics is fast and is inertia less. An accurate super short-time prediction is required to compensate for the fluctuations and reduce the impact of solar power penetration on the power system. The objective is to predict one step-ahead solar power generation based only on historical solar power time series data. The proposed method incorporates discrete wavelet transform (DWT), Auto-Regressive Moving Average (ARMA)more » models, and Recurrent Neural Networks (RNN), while the RNN architecture is based on Nonlinear Auto-Regressive models with eXogenous inputs (NARX). The wavelet transform is utilized to decompose the solar power time series into a set of richer-behaved forming series for prediction. ARMA model is employed as a linear predictor while NARX is used as a nonlinear pattern recognition tool to estimate and compensate the error of wavelet-ARMA prediction. The proposed method is applied to the data captured from UCLA solar PV panels and the results are compared with some of the common and most recent solar power prediction methods. The results validate the effectiveness of the proposed approach and show a considerable improvement in the prediction precision.« less
ERIC Educational Resources Information Center
Fox, William
2012-01-01
The purpose of our modeling effort is to predict future outcomes. We assume the data collected are both accurate and relatively precise. For our oscillating data, we examined several mathematical modeling forms for predictions. We also examined both ignoring the oscillations as an important feature and including the oscillations as an important…
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Brümmer, C.; Moffat, A. M.; Huth, V.; Augustin, J.; Herbst, M.; Kutsch, W. L.
2016-12-01
Manual carbon dioxide flux measurements with closed chambers at scheduled campaigns are a versatile method to study management effects at small scales in multiple-plot experiments. The eddy covariance technique has the advantage of quasi-continuous measurements but requires large homogeneous areas of a few hectares. To evaluate the uncertainties associated with interpolating from individual campaigns to the whole vegetation period, we installed both techniques at an agricultural site in Northern Germany. The presented comparison covers two cropping seasons, winter oilseed rape in 2012/13 and winter wheat in 2013/14. Modeling half-hourly carbon fluxes from campaigns is commonly performed based on non-linear regressions for the light response and respiration. The daily averages of net CO2 modeled from chamber data deviated from eddy covariance measurements in the range of ± 5 g C m-2 day-1. To understand the observed differences and to disentangle the effects, we performed four additional setups (expert versus default settings of the non-linear regressions based algorithm, purely empirical modeling with artificial neural networks versus non-linear regressions, cross-validating using eddy covariance measurements as campaign fluxes, weekly versus monthly scheduling of campaigns) to model the half-hourly carbon fluxes for the whole vegetation period. The good agreement of the seasonal course of net CO2 at plot and field scale for our agricultural site demonstrates that both techniques are robust and yield consistent results at seasonal time scale even for a managed ecosystem with high temporal dynamics in the fluxes. This allows combining the respective advantages of factorial experiments at plot scale with dense time series data at field scale. Furthermore, the information from the quasi-continuous eddy covariance measurements can be used to derive vegetation proxies to support the interpolation of carbon fluxes in-between the manual chamber campaigns.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws.
Xiao, Xiao; White, Ethan P; Hooten, Mevin B; Durham, Susan L
2011-10-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.
Ritz, Christian; Van der Vliet, Leana
2009-09-01
The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
NASA Astrophysics Data System (ADS)
Shao, G.; Gallion, J.; Fei, S.
2016-12-01
Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.
NASA Astrophysics Data System (ADS)
Bhojawala, V. M.; Vakharia, D. P.
2017-12-01
This investigation provides an accurate prediction of static pull-in voltage for clamped-clamped micro/nano beams based on distributed model. The Euler-Bernoulli beam theory is used adapting geometric non-linearity of beam, internal (residual) stress, van der Waals force, distributed electrostatic force and fringing field effects for deriving governing differential equation. The Galerkin discretisation method is used to make reduced-order model of the governing differential equation. A regime plot is presented in the current work for determining the number of modes required in reduced-order model to obtain completely converged pull-in voltage for micro/nano beams. A closed-form relation is developed based on the relationship obtained from curve fitting of pull-in instability plots and subsequent non-linear regression for the proposed relation. The output of regression analysis provides Chi-square (χ 2) tolerance value equals to 1 × 10-9, adjusted R-square value equals to 0.999 29 and P-value equals to zero, these statistical parameters indicate the convergence of non-linear fit, accuracy of fitted data and significance of the proposed model respectively. The closed-form equation is validated using available data of experimental and numerical results. The relative maximum error of 4.08% in comparison to several available experimental and numerical data proves the reliability of the proposed closed-form equation.
Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.
Gong, Xiajing; Hu, Meng; Zhao, Liang
2018-05-01
Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Kuriakose, Saji; Joe, I Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kuriakose, Saji; Joe, I. Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.
Determination of suitable drying curve model for bread moisture loss during baking
NASA Astrophysics Data System (ADS)
Soleimani Pour-Damanab, A. R.; Jafary, A.; Rafiee, S.
2013-03-01
This study presents mathematical modelling of bread moisture loss or drying during baking in a conventional bread baking process. In order to estimate and select the appropriate moisture loss curve equation, 11 different models, semi-theoretical and empirical, were applied to the experimental data and compared according to their correlation coefficients, chi-squared test and root mean square error which were predicted by nonlinear regression analysis. Consequently, of all the drying models, a Page model was selected as the best one, according to the correlation coefficients, chi-squared test, and root mean square error values and its simplicity. Mean absolute estimation error of the proposed model by linear regression analysis for natural and forced convection modes was 2.43, 4.74%, respectively.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Dron, Julien; Dodi, Alain
2011-06-15
The removal of chloride, nitrate and sulfate ions from aqueous solutions by a macroporous resin is studied through the ion exchange systems OH(-)/Cl(-), OH(-)/NO(3)(-), OH(-)/SO(4)(2-), and HCO(3)(-)/Cl(-), Cl(-)/NO(3)(-), Cl(-)/SO(4)(2-). They are investigated by means of Langmuir, Freundlich, Dubinin-Radushkevitch (D-R) and Dubinin-Astakhov (D-A) single-component adsorption isotherms. The sorption parameters and the fitting of the models are determined by nonlinear regression and discussed. The Langmuir model provides a fair estimation of the sorption capacity whatever the system under study, on the contrary to Freundlich and D-R models. The adsorption energies deduced from Dubinin and Langmuir isotherms are in good agreement, and the surface parameter of the D-A isotherm appears consistent. All models agree on the order of affinity OH(-)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luh, G.C.
1994-01-01
This thesis presents the application of advanced modeling techniques to construct nonlinear forward and inverse models of internal combustion engines for the detection and isolation of incipient faults. The NARMAX (Nonlinear Auto-Regressive Moving Average modeling with eXogenous inputs) technique of system identification proposed by Leontaritis and Billings was used to derive the nonlinear model of a internal combustion engine, over operating conditions corresponding to the I/M240 cycle. The I/M240 cycle is a standard proposed by the United States Environmental Protection Agency to measure tailpipe emissions in inspection and maintenance programs and consists of a driving schedule developed for the purposemore » of testing compliance with federal vehicle emission standards for carbon monoxide, unburned hydrocarbons, and nitrogen oxides. The experimental work for model identification and validation was performed on a 3.0 liter V6 engine installed in an engine test cell at the Center for Automotive Research at The Ohio State University. In this thesis, different types of model structures were proposed to obtain multi-input multi-output (MIMO) nonlinear NARX models. A modification of the algorithm proposed by He and Asada was used to estimate the robust orders of the derived MIMO nonlinear models. A methodology for the analysis of inverse NARX model was developed. Two methods were proposed to derive the inverse NARX model: (1) inversion from the forward NARX model; and (2) direct identification of inverse model from the output-input data set. In this thesis, invertibility, minimum-phase characteristic of zero dynamics, and stability analysis of NARX forward model are also discussed. Stability in the sense of Lyapunov is also investigated to check the stability of the identified forward and inverse models. This application of inverse problem leads to the estimation of unknown inputs and to actuator fault diagnosis.« less
A non-linear data mining parameter selection algorithm for continuous variables
Razavi, Marianne; Brady, Sean
2017-01-01
In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829
Mishra, Vishal
2015-01-01
The interchange of the protons with the cell wall-bound calcium and magnesium ions at the interface of solution/bacterial cell surface in the biosorption system at various concentrations of protons has been studied in the present work. A mathematical model for establishing the correlation between concentration of protons and active sites was developed and optimized. The sporadic limited residence time reactor was used to titrate the calcium and magnesium ions at the individual data point. The accuracy of the proposed mathematical model was estimated using error functions such as nonlinear regression, adjusted nonlinear regression coefficient, the chi-square test, P-test and F-test. The values of the chi-square test (0.042-0.017), P-test (<0.001-0.04), sum of square errors (0.061-0.016), root mean square error (0.01-0.04) and F-test (2.22-19.92) reported in the present research indicated the suitability of the model over a wide range of proton concentrations. The zeta potential of the bacterium surface at various concentrations of protons was observed to validate the denaturation of active sites.
NASA Astrophysics Data System (ADS)
Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem
2010-09-01
In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.
Huang, C.; Townshend, J.R.G.
2003-01-01
A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
NASA Astrophysics Data System (ADS)
Choudhury, Anustup; Farrell, Suzanne; Atkins, Robin; Daly, Scott
2017-09-01
We present an approach to predict overall HDR display quality as a function of key HDR display parameters. We first performed subjective experiments on a high quality HDR display that explored five key HDR display parameters: maximum luminance, minimum luminance, color gamut, bit-depth and local contrast. Subjects rated overall quality for different combinations of these display parameters. We explored two models | a physical model solely based on physically measured display characteristics and a perceptual model that transforms physical parameters using human vision system models. For the perceptual model, we use a family of metrics based on a recently published color volume model (ICT-CP), which consists of the PQ luminance non-linearity (ST2084) and LMS-based opponent color, as well as an estimate of the display point spread function. To predict overall visual quality, we apply linear regression and machine learning techniques such as Multilayer Perceptron, RBF and SVM networks. We use RMSE and Pearson/Spearman correlation coefficients to quantify performance. We found that the perceptual model is better at predicting subjective quality than the physical model and that SVM is better at prediction than linear regression. The significance and contribution of each display parameter was investigated. In addition, we found that combined parameters such as contrast do not improve prediction. Traditional perceptual models were also evaluated and we found that models based on the PQ non-linearity performed better.
A Novel Approach to Implement Takagi-Sugeno Fuzzy Models.
Chang, Chia-Wen; Tao, Chin-Wang
2017-09-01
This paper proposes new algorithms based on the fuzzy c-regressing model algorithm for Takagi-Sugeno (T-S) fuzzy modeling of the complex nonlinear systems. A fuzzy c-regression state model (FCRSM) algorithm is a T-S fuzzy model in which the functional antecedent and the state-space-model-type consequent are considered with the available input-output data. The antecedent and consequent forms of the proposed FCRSM consists mainly of two advantages: one is that the FCRSM has low computation load due to only one input variable is considered in the antecedent part; another is that the unknown system can be modeled to not only the polynomial form but also the state-space form. Moreover, the FCRSM can be extended to FCRSM-ND and FCRSM-Free algorithms. An algorithm FCRSM-ND is presented to find the T-S fuzzy state-space model of the nonlinear system when the input-output data cannot be precollected and an assumed effective controller is available. In the practical applications, the mathematical model of controller may be hard to be obtained. In this case, an online tuning algorithm, FCRSM-FREE, is designed such that the parameters of a T-S fuzzy controller and the T-S fuzzy state model of an unknown system can be online tuned simultaneously. Four numerical simulations are given to demonstrate the effectiveness of the proposed approach.
A nonlinear autoregressive Volterra model of the Hodgkin-Huxley equations.
Eikenberry, Steffen E; Marmarelis, Vasilis Z
2013-02-01
We propose a new variant of Volterra-type model with a nonlinear auto-regressive (NAR) component that is a suitable framework for describing the process of AP generation by the neuron membrane potential, and we apply it to input-output data generated by the Hodgkin-Huxley (H-H) equations. Volterra models use a functional series expansion to describe the input-output relation for most nonlinear dynamic systems, and are applicable to a wide range of physiologic systems. It is difficult, however, to apply the Volterra methodology to the H-H model because is characterized by distinct subthreshold and suprathreshold dynamics. When threshold is crossed, an autonomous action potential (AP) is generated, the output becomes temporarily decoupled from the input, and the standard Volterra model fails. Therefore, in our framework, whenever membrane potential exceeds some threshold, it is taken as a second input to a dual-input Volterra model. This model correctly predicts membrane voltage deflection both within the subthreshold region and during APs. Moreover, the model naturally generates a post-AP afterpotential and refractory period. It is known that the H-H model converges to a limit cycle in response to a constant current injection. This behavior is correctly predicted by the proposed model, while the standard Volterra model is incapable of generating such limit cycle behavior. The inclusion of cross-kernels, which describe the nonlinear interactions between the exogenous and autoregressive inputs, is found to be absolutely necessary. The proposed model is general, non-parametric, and data-derived.
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Saputro, Dewi Retno Sari
2017-03-01
GWOLR model used for represent relationship between dependent variable has categories and scale of category is ordinal with independent variable influenced the geographical location of the observation site. Parameters estimation of GWOLR model use maximum likelihood provide system of nonlinear equations and hard to be found the result in analytic resolution. By finishing it, it means determine the maximum completion, this thing associated with optimizing problem. The completion nonlinear system of equations optimize use numerical approximation, which one is Newton Raphson method. The purpose of this research is to make iteration algorithm Newton Raphson and program using R software to estimate GWOLR model. Based on the research obtained that program in R can be used to estimate the parameters of GWOLR model by forming a syntax program with command "while".
Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling.
Perdikaris, P; Raissi, M; Damianou, A; Lawrence, N D; Karniadakis, G E
2017-02-01
Multi-fidelity modelling enables accurate inference of quantities of interest by synergistically combining realizations of low-cost/low-fidelity models with a small set of high-fidelity observations. This is particularly effective when the low- and high-fidelity models exhibit strong correlations, and can lead to significant computational gains over approaches that solely rely on high-fidelity models. However, in many cases of practical interest, low-fidelity models can only be well correlated to their high-fidelity counterparts for a specific range of input parameters, and potentially return wrong trends and erroneous predictions if probed outside of their validity regime. Here we put forth a probabilistic framework based on Gaussian process regression and nonlinear autoregressive schemes that is capable of learning complex nonlinear and space-dependent cross-correlations between models of variable fidelity, and can effectively safeguard against low-fidelity models that provide wrong trends. This introduces a new class of multi-fidelity information fusion algorithms that provide a fundamental extension to the existing linear autoregressive methodologies, while still maintaining the same algorithmic complexity and overall computational cost. The performance of the proposed methods is tested in several benchmark problems involving both synthetic and real multi-fidelity datasets from computational fluid dynamics simulations.
Modeling exposure–lag–response associations with distributed lag non-linear models
Gasparrini, Antonio
2014-01-01
In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094
Sahin, Rubina; Tapadia, Kavita
2015-01-01
The three widely used isotherms Langmuir, Freundlich and Temkin were examined in an experiment using fluoride (F⁻) ion adsorption on a geo-material (limonite) at four different temperatures by linear and non-linear models. Comparison of linear and non-linear regression models were given in selecting the optimum isotherm for the experimental results. The coefficient of determination, r², was used to select the best theoretical isotherm. The four Langmuir linear equations (1, 2, 3, and 4) are discussed. Langmuir isotherm parameters obtained from the four Langmuir linear equations using the linear model differed but they were the same when using the nonlinear model. Langmuir-2 isotherm is one of the linear forms, and it had the highest coefficient of determination (r² = 0.99) compared to the other Langmuir linear equations (1, 3 and 4) in linear form, whereas, for non-linear, Langmuir-4 fitted best among all the isotherms because it had the highest coefficient of determination (r² = 0.99). The results showed that the non-linear model may be a better way to obtain the parameters. In the present work, the thermodynamic parameters show that the absorption of fluoride onto limonite is both spontaneous (ΔG < 0) and endothermic (ΔH > 0). Scanning electron microscope and X-ray diffraction images also confirm the adsorption of F⁻ ion onto limonite. The isotherm and kinetic study reveals that limonite can be used as an adsorbent for fluoride removal. In future we can develop new technology for fluoride removal in large scale by using limonite which is cost-effective, eco-friendly and is easily available in the study area.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law
NASA Astrophysics Data System (ADS)
García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.
2016-06-01
The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2010-01-01
Statistical prediction remains an important tool for decisions in a variety of disciplines. An equally important issue is identifying factors that contribute to more or less accurate predictions. The time series literature includes well developed methods for studying predictability and volatility over time. This article develops…
Predicting radiotherapy outcomes using statistical learning techniques
NASA Astrophysics Data System (ADS)
El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.
2009-09-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data. Part of this work was first presented at the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11-13 December 2008.
Comparative evaluation of adsorption kinetics of diclofenac and isoproturon by activated carbon.
Torrellas, Silvia A; Rodriguez, Araceli R; Escudero, Gabriel O; Martín, José María G; Rodriguez, Juan G
2015-01-01
Adsorption mechanism of diclofenac and isoproturon onto activated carbon has been proposed using Langmuir and Freundlich isotherms. Adsorption capacity and optimum adsorption isotherms were predicted by nonlinear regression method. Different kinetic equations, pseudo-first-order, pseudo-second-order, intraparticle diffusion model and Bangham kinetic model, were applied to study the adsorption kinetics of emerging contaminants on activated carbon in two aqueous matrices.
Equations relating compacted and uncompacted live crown ratio for common tree species in the South
KaDonna C. Randolph
2010-01-01
Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...
Egg production forecasting: Determining efficient modeling approaches.
Ahmad, H A
2011-12-01
Several mathematical or statistical and artificial intelligence models were developed to compare egg production forecasts in commercial layers. Initial data for these models were collected from a comparative layer trial on commercial strains conducted at the Poultry Research Farms, Auburn University. Simulated data were produced to represent new scenarios by using means and SD of egg production of the 22 commercial strains. From the simulated data, random examples were generated for neural network training and testing for the weekly egg production prediction from wk 22 to 36. Three neural network architectures-back-propagation-3, Ward-5, and the general regression neural network-were compared for their efficiency to forecast egg production, along with other traditional models. The general regression neural network gave the best-fitting line, which almost overlapped with the commercial egg production data, with an R(2) of 0.71. The general regression neural network-predicted curve was compared with original egg production data, the average curves of white-shelled and brown-shelled strains, linear regression predictions, and the Gompertz nonlinear model. The general regression neural network was superior in all these comparisons and may be the model of choice if the initial overprediction is managed efficiently. In general, neural network models are efficient, are easy to use, require fewer data, and are practical under farm management conditions to forecast egg production.
Krasikova, Dina V; Le, Huy; Bachura, Eric
2018-06-01
To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
NASA Astrophysics Data System (ADS)
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
NASA Astrophysics Data System (ADS)
Cobourn, W. Geoffrey
2010-08-01
An enhanced PM 2.5 air quality forecast model based on nonlinear regression (NLR) and back-trajectory concentrations has been developed for use in the Louisville, Kentucky metropolitan area. The PM 2.5 air quality forecast model is designed for use in the warm season, from May through September, when PM 2.5 air quality is more likely to be critical for human health. The enhanced PM 2.5 model consists of a basic NLR model, developed for use with an automated air quality forecast system, and an additional parameter based on upwind PM 2.5 concentration, called PM24. The PM24 parameter is designed to be determined manually, by synthesizing backward air trajectory and regional air quality information to compute 24-h back-trajectory concentrations. The PM24 parameter may be used by air quality forecasters to adjust the forecast provided by the automated forecast system. In this study of the 2007 and 2008 forecast seasons, the enhanced model performed well using forecasted meteorological data and PM24 as input. The enhanced PM 2.5 model was compared with three alternative models, including the basic NLR model, the basic NLR model with a persistence parameter added, and the NLR model with persistence and PM24. The two models that included PM24 were of comparable accuracy. The two models incorporating back-trajectory concentrations had lower mean absolute errors and higher rates of detecting unhealthy PM2.5 concentrations compared to the other models.
Song, Yong-Ze; Yang, Hong-Lei; Peng, Jun-Huan; Song, Yi-Rong; Sun, Qian; Li, Yuan
2015-01-01
Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi’an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5. PMID:26540446
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws
Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L.
2011-01-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain. ?? 2011 by the Ecological Society of America.
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model
Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei
2015-01-01
High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
Wilderness recreation participation: Projections for the next half century
J. M. Bowker; D. Murphy; H. K. Cordell; D. B. K. English; J. C. Bergstrom; C. M. Starbuck; C. J. Betz; G. T. Green; P. Reed
2007-01-01
This paper explores the influence of demographic and spatial variables on individual participation in wildland area recreation. Data from the National Survey on Recreation and the Environment (NSRE) are combined with GIS-based distance measures to develop nonlinear regression models used to predict both participation and the number of days of participation in...
ERIC Educational Resources Information Center
Strang, Kenneth David
2009-01-01
This paper discusses how a seldom-used statistical procedure, recursive regression (RR), can numerically and graphically illustrate data-driven nonlinear relationships and interaction of variables. This routine falls into the family of exploratory techniques, yet a few interesting features make it a valuable compliment to factor analysis and…
NASA Astrophysics Data System (ADS)
Allison, Lesley; Hawkins, Ed; Woollings, Tim
2015-01-01
Many previous studies have shown that unforced climate model simulations exhibit decadal-scale fluctuations in the Atlantic meridional overturning circulation (AMOC), and that this variability can have impacts on surface climate fields. However, the robustness of these surface fingerprints across different models is less clear. Furthermore, with the potential for coupled feedbacks that may amplify or damp the response, it is not known whether the associated climate signals are linearly related to the strength of the AMOC changes, or if the fluctuation events exhibit nonlinear behaviour with respect to their strength or polarity. To explore these questions, we introduce an objective and flexible method for identifying the largest natural AMOC fluctuation events in multicentennial/multimillennial simulations of a variety of coupled climate models. The characteristics of the events are explored, including their magnitude, meridional coherence and spatial structure, as well as links with ocean heat transport and the horizontal circulation. The surface fingerprints in ocean temperature and salinity are examined, and compared with the results of linear regression analysis. It is found that the regressions generally provide a good indication of the surface changes associated with the largest AMOC events. However, there are some exceptions, including a nonlinear change in the atmospheric pressure signal, particularly at high latitudes, in HadCM3. Some asymmetries are also found between the changes associated with positive and negative AMOC events in the same model. Composite analysis suggests that there are signals that are robust across the largest AMOC events in each model, which provides reassurance that the surface changes associated with one particular event will be similar to those expected from regression analysis. However, large differences are found between the AMOC fingerprints in different models, which may hinder the prediction and attribution of such events in reality.
Douglas, R K; Nawar, S; Alamar, M C; Mouazen, A M; Coulon, F
2018-03-01
Visible and near infrared spectrometry (vis-NIRS) coupled with data mining techniques can offer fast and cost-effective quantitative measurement of total petroleum hydrocarbons (TPH) in contaminated soils. Literature showed however significant differences in the performance on the vis-NIRS between linear and non-linear calibration methods. This study compared the performance of linear partial least squares regression (PLSR) with a nonlinear random forest (RF) regression for the calibration of vis-NIRS when analysing TPH in soils. 88 soil samples (3 uncontaminated and 85 contaminated) collected from three sites located in the Niger Delta were scanned using an analytical spectral device (ASD) spectrophotometer (350-2500nm) in diffuse reflectance mode. Sequential ultrasonic solvent extraction-gas chromatography (SUSE-GC) was used as reference quantification method for TPH which equal to the sum of aliphatic and aromatic fractions ranging between C 10 and C 35 . Prior to model development, spectra were subjected to pre-processing including noise cut, maximum normalization, first derivative and smoothing. Then 65 samples were selected as calibration set and the remaining 20 samples as validation set. Both vis-NIR spectrometry and gas chromatography profiles of the 85 soil samples were subjected to RF and PLSR with leave-one-out cross-validation (LOOCV) for the calibration models. Results showed that RF calibration model with a coefficient of determination (R 2 ) of 0.85, a root means square error of prediction (RMSEP) 68.43mgkg -1 , and a residual prediction deviation (RPD) of 2.61 outperformed PLSR (R 2 =0.63, RMSEP=107.54mgkg -1 and RDP=2.55) in cross-validation. These results indicate that RF modelling approach is accounting for the nonlinearity of the soil spectral responses hence, providing significantly higher prediction accuracy compared to the linear PLSR. It is recommended to adopt the vis-NIRS coupled with RF modelling approach as a portable and cost effective method for the rapid quantification of TPH in soils. Copyright © 2017 Elsevier B.V. All rights reserved.
Wagner, Brian J.; Gorelick, Steven M.
1986-01-01
A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.
Helbich, Marco; Klein, Nadja; Roberts, Hannah; Hagedoorn, Paulien; Groenewegen, Peter P
2018-06-20
Exposure to green space seems to be beneficial for self-reported mental health. In this study we used an objective health indicator, namely antidepressant prescription rates. Current studies rely exclusively upon mean regression models assuming linear associations. It is, however, plausible that the presence of green space is non-linearly related with different quantiles of the outcome antidepressant prescription rates. These restrictions may contribute to inconsistent findings. Our aim was: a) to assess antidepressant prescription rates in relation to green space, and b) to analyze how the relationship varies non-linearly across different quantiles of antidepressant prescription rates. We used cross-sectional data for the year 2014 at a municipality level in the Netherlands. Ecological Bayesian geoadditive quantile regressions were fitted for the 15%, 50%, and 85% quantiles to estimate green space-prescription rate correlations, controlling for physical activity levels, socio-demographics, urbanicity, etc. RESULTS: The results suggested that green space was overall inversely and non-linearly associated with antidepressant prescription rates. More important, the associations differed across the quantiles, although the variation was modest. Significant non-linearities were apparent: The associations were slightly positive in the lower quantile and strongly negative in the upper one. Our findings imply that an increased availability of green space within a municipality may contribute to a reduction in the number of antidepressant prescriptions dispensed. Green space is thus a central health and community asset, whilst a minimum level of 28% needs to be established for health gains. The highest effectiveness occurred at a municipality surface percentage higher than 79%. This inverse dose-dependent relation has important implications for setting future community-level health and planning policies. Copyright © 2018 Elsevier Inc. All rights reserved.
Conrad, Douglas J; Bailey, Barbara A; Hardie, Jon A; Bakke, Per S; Eagan, Tomas M L; Aarli, Bernt B
2017-01-01
Clinical phenotyping, therapeutic investigations as well as genomic, airway secretion metabolomic and metagenomic investigations can benefit from robust, nonlinear modeling of FEV1 in individual subjects. We demonstrate the utility of measuring FEV1 dynamics in representative cystic fibrosis (CF) and chronic obstructive pulmonary disease (COPD) populations. Individual FEV1 data from CF and COPD subjects were modeled by estimating median regression splines and their predicted first and second derivatives. Classes were created from variables that capture the dynamics of these curves in both cohorts. Nine FEV1 dynamic variables were identified from the splines and their predicted derivatives in individuals with CF (n = 177) and COPD (n = 374). Three FEV1 dynamic classes (i.e. stable, intermediate and hypervariable) were generated and described using these variables from both cohorts. In the CF cohort, the FEV1 hypervariable class (HV) was associated with a clinically unstable, female-dominated phenotypes while stable FEV1 class (S) individuals were highly associated with the male-dominated milder clinical phenotype. In the COPD cohort, associations were found between the FEV1 dynamic classes, the COPD GOLD grades, with exacerbation frequency and symptoms. Nonlinear modeling of FEV1 with splines provides new insights and is useful in characterizing CF and COPD clinical phenotypes.
Monthly monsoon rainfall forecasting using artificial neural networks
NASA Astrophysics Data System (ADS)
Ganti, Ravikumar
2014-10-01
Indian agriculture sector heavily depends on monsoon rainfall for successful harvesting. In the past, prediction of rainfall was mainly performed using regression models, which provide reasonable accuracy in the modelling and forecasting of complex physical systems. Recently, Artificial Neural Networks (ANNs) have been proposed as efficient tools for modelling and forecasting. A feed-forward multi-layer perceptron type of ANN architecture trained using the popular back-propagation algorithm was employed in this study. Other techniques investigated for modeling monthly monsoon rainfall include linear and non-linear regression models for comparison purposes. The data employed in this study include monthly rainfall and monthly average of the daily maximum temperature in the North Central region in India. Specifically, four regression models and two ANN model's were developed. The performance of various models was evaluated using a wide variety of standard statistical parameters and scatter plots. The results obtained in this study for forecasting monsoon rainfalls using ANNs have been encouraging. India's economy and agricultural activities can be effectively managed with the help of the availability of the accurate monsoon rainfall forecasts.
Parameterizing sorption isotherms using a hybrid global-local fitting procedure.
Matott, L Shawn; Singh, Anshuman; Rabideau, Alan J
2017-05-01
Predictive modeling of the transport and remediation of groundwater contaminants requires an accurate description of the sorption process, which is usually provided by fitting an isotherm model to site-specific laboratory data. Commonly used calibration procedures, listed in order of increasing sophistication, include: trial-and-error, linearization, non-linear regression, global search, and hybrid global-local search. Given the considerable variability in fitting procedures applied in published isotherm studies, we investigated the importance of algorithm selection through a series of numerical experiments involving 13 previously published sorption datasets. These datasets, considered representative of state-of-the-art for isotherm experiments, had been previously analyzed using trial-and-error, linearization, or non-linear regression methods. The isotherm expressions were re-fit using a 3-stage hybrid global-local search procedure (i.e. global search using particle swarm optimization followed by Powell's derivative free local search method and Gauss-Marquardt-Levenberg non-linear regression). The re-fitted expressions were then compared to previously published fits in terms of the optimized weighted sum of squared residuals (WSSR) fitness function, the final estimated parameters, and the influence on contaminant transport predictions - where easily computed concentration-dependent contaminant retardation factors served as a surrogate measure of likely transport behavior. Results suggest that many of the previously published calibrated isotherm parameter sets were local minima. In some cases, the updated hybrid global-local search yielded order-of-magnitude reductions in the fitness function. In particular, of the candidate isotherms, the Polanyi-type models were most likely to benefit from the use of the hybrid fitting procedure. In some cases, improvements in fitness function were associated with slight (<10%) changes in parameter values, but in other cases significant (>50%) changes in parameter values were noted. Despite these differences, the influence of isotherm misspecification on contaminant transport predictions was quite variable and difficult to predict from inspection of the isotherms. Copyright © 2017 Elsevier B.V. All rights reserved.
Pande, Amit; Mohapatra, Prasant; Nicorici, Alina; Han, Jay J
2016-07-19
Children with physical impairments are at a greater risk for obesity and decreased physical activity. A better understanding of physical activity pattern and energy expenditure (EE) would lead to a more targeted approach to intervention. This study focuses on studying the use of machine-learning algorithms for EE estimation in children with disabilities. A pilot study was conducted on children with Duchenne muscular dystrophy (DMD) to identify important factors for determining EE and develop a novel algorithm to accurately estimate EE from wearable sensor-collected data. There were 7 boys with DMD, 6 healthy control boys, and 22 control adults recruited. Data were collected using smartphone accelerometer and chest-worn heart rate sensors. The gold standard EE values were obtained from the COSMED K4b2 portable cardiopulmonary metabolic unit worn by boys (aged 6-10 years) with DMD and controls. Data from this sensor setup were collected simultaneously during a series of concurrent activities. Linear regression and nonlinear machine-learning-based approaches were used to analyze the relationship between accelerometer and heart rate readings and COSMED values. Existing calorimetry equations using linear regression and nonlinear machine-learning-based models, developed for healthy adults and young children, give low correlation to actual EE values in children with disabilities (14%-40%). The proposed model for boys with DMD uses ensemble machine learning techniques and gives a 91% correlation with actual measured EE values (root mean square error of 0.017). Our results confirm that the methods developed to determine EE using accelerometer and heart rate sensor values in normal adults are not appropriate for children with disabilities and should not be used. A much more accurate model is obtained using machine-learning-based nonlinear regression specifically developed for this target population. ©Amit Pande, Prasant Mohapatra, Alina Nicorici, Jay J Han. Originally published in JMIR Rehabilitation and Assistive Technology (http://rehab.jmir.org), 19.07.2016.
Carbon dioxide stripping in aquaculture -- part III: model verification
Colt, John; Watten, Barnaby; Pfeiffer, Tim
2012-01-01
Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Decomposing Racial/Ethnic Disparities in Influenza Vaccination among the Elderly
Yoo, Byung-Kwang; Hasebe, Takuya; Szilagyi, Peter G.
2015-01-01
While persistent racial/ethnic disparities in influenza vaccination have been reported among the elderly, characteristics contributing to disparities are poorly understood. This study aimed to assess characteristics associated with racial/ethnic disparities in influenza vaccination using a nonlinear Oaxaca-Blinder decomposition method. We performed cross-sectional multivariable logistic regression analyses for which the dependent variable was self-reported receipt of influenza vaccine during the 2010–2011 season among community dwelling non-Hispanic African-American (AA), non-Hispanic White (W), English-speaking Hispanic (EH) and Spanish-speaking Hispanic (SH) elderly, enrolled in the 2011 Medicare Current Beneficiary Survey (MCBS) (un-weighted/weighted N= 6,095/19.2million). Using the nonlinear Oaxaca-Blinder decomposition method, we assessed the relative contribution of seventeen covariates—including socio-demographic characteristics, health status, insurance, access, preference regarding healthcare, and geographic regions —to disparities in influenza vaccination. Unadjusted racial/ethnic disparities in influenza vaccination were 14.1 percentage points (pp) (W-AA disparity, p<.001), 25.7 pp (W-SH disparity, p<.001) and 0.6 pp (W-EH disparity, p>.8). The Oaxaca-Blinder decomposition method estimated that the unadjusted W-AA and W-SH disparities in vaccination could be reduced by only 45% even if AA and SH groups become equivalent to Whites in all covariates in multivariable regression models. The remaining 55% of disparities were attributed to (a) racial/ethnic differences in the estimated coefficients (e.g., odds ratios) in the regression models and (b) characteristics not included in the regression models. Our analysis found that only about 45% of racial/ethnic disparities in influenza vaccination among the elderly could be reduced by equalizing recognized characteristics among racial/ethnic groups. Future studies are needed to identify additional modifiable characteristics causing disparities in influenza vaccination. PMID:25900133
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Dai, C; Cai, X H; Cai, Y P; Guo, H C; Sun, W; Tan, Q; Huang, G H
2014-06-01
This research developed a simulation-aided nonlinear programming model (SNPM). This model incorporated the consideration of pollutant dispersion modeling, and the management of coal blending and the related human health risks within a general modeling framework In SNPM, the simulation effort (i.e., California puff [CALPUFF]) was used to forecast the fate of air pollutants for quantifying the health risk under various conditions, while the optimization studies were to identify the optimal coal blending strategies from a number of alternatives. To solve the model, a surrogate-based indirect search approach was proposed, where the support vector regression (SVR) was used to create a set of easy-to-use and rapid-response surrogates for identifying the function relationships between coal-blending operating conditions and health risks. Through replacing the CALPUFF and the corresponding hazard quotient equation with the surrogates, the computation efficiency could be improved. The developed SNPM was applied to minimize the human health risk associated with air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicated that it could be used for reducing the health risk of the public in the vicinity of the two power plants, identifying desired coal blending strategies for decision makers, and considering a proper balance between coal purchase cost and human health risk. A simulation-aided nonlinear programming model (SNPM) is developed. It integrates the advantages of CALPUFF and nonlinear programming model. To solve the model, a surrogate-based indirect search approach based on the combination of support vector regression and genetic algorithm is proposed. SNPM is applied to reduce the health risk caused by air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicate that it is useful for generating coal blending schemes, reducing the health risk of the public, reflecting the trade-offbetween coal purchase cost and health risk.
Foglia, L.; Hill, Mary C.; Mehl, Steffen W.; Burlando, P.
2009-01-01
We evaluate the utility of three interrelated means of using data to calibrate the fully distributed rainfall‐runoff model TOPKAPI as applied to the Maggia Valley drainage area in Switzerland. The use of error‐based weighting of observation and prior information data, local sensitivity analysis, and single‐objective function nonlinear regression provides quantitative evaluation of sensitivity of the 35 model parameters to the data, identification of data types most important to the calibration, and identification of correlations among parameters that contribute to nonuniqueness. Sensitivity analysis required only 71 model runs, and regression required about 50 model runs. The approach presented appears to be ideal for evaluation of models with long run times or as a preliminary step to more computationally demanding methods. The statistics used include composite scaled sensitivities, parameter correlation coefficients, leverage, Cook's D, and DFBETAS. Tests suggest predictive ability of the calibrated model typical of hydrologic models.
Kinetic modelling for zinc (II) ions biosorption onto Luffa cylindrica
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oboh, I., E-mail: innocentoboh@uniuyo.edu.ng; Aluyor, E.; Audu, T.
The biosorption of Zinc (II) ions onto a biomaterial - Luffa cylindrica has been studied. This biomaterial was characterized by elemental analysis, surface area, pore size distribution, scanning electron microscopy, and the biomaterial before and after sorption, was characterized by Fourier Transform Infra Red (FTIR) spectrometer. The kinetic nonlinear models fitted were Pseudo-first order, Pseudo-second order and Intra-particle diffusion. A comparison of non-linear regression method in selecting the kinetic model was made. Four error functions, namely coefficient of determination (R{sup 2}), hybrid fractional error function (HYBRID), average relative error (ARE), and sum of the errors squared (ERRSQ), were used tomore » predict the parameters of the kinetic models. The strength of this study is that a biomaterial with wide distribution particularly in the tropical world and which occurs as waste material could be put into effective utilization as a biosorbent to address a crucial environmental problem.« less
Kinetic modelling for zinc (II) ions biosorption onto Luffa cylindrica
NASA Astrophysics Data System (ADS)
Oboh, I.; Aluyor, E.; Audu, T.
2015-03-01
The biosorption of Zinc (II) ions onto a biomaterial - Luffa cylindrica has been studied. This biomaterial was characterized by elemental analysis, surface area, pore size distribution, scanning electron microscopy, and the biomaterial before and after sorption, was characterized by Fourier Transform Infra Red (FTIR) spectrometer. The kinetic nonlinear models fitted were Pseudo-first order, Pseudo-second order and Intra-particle diffusion. A comparison of non-linear regression method in selecting the kinetic model was made. Four error functions, namely coefficient of determination (R2), hybrid fractional error function (HYBRID), average relative error (ARE), and sum of the errors squared (ERRSQ), were used to predict the parameters of the kinetic models. The strength of this study is that a biomaterial with wide distribution particularly in the tropical world and which occurs as waste material could be put into effective utilization as a biosorbent to address a crucial environmental problem.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
Kaimakamis, Evangelos; Tsara, Venetia; Bratsas, Charalambos; Sichletidis, Lazaros; Karvounis, Charalambos; Maglaveras, Nikolaos
2016-01-01
Obstructive Sleep Apnea (OSA) is a common sleep disorder requiring the time/money consuming polysomnography for diagnosis. Alternative methods for initial evaluation are sought. Our aim was the prediction of Apnea-Hypopnea Index (AHI) in patients potentially suffering from OSA based on nonlinear analysis of respiratory biosignals during sleep, a method that is related to the pathophysiology of the disorder. Patients referred to a Sleep Unit (135) underwent full polysomnography. Three nonlinear indices (Largest Lyapunov Exponent, Detrended Fluctuation Analysis and Approximate Entropy) extracted from two biosignals (airflow from a nasal cannula, thoracic movement) and one linear derived from Oxygen saturation provided input to a data mining application with contemporary classification algorithms for the creation of predictive models for AHI. A linear regression model presented a correlation coefficient of 0.77 in predicting AHI. With a cutoff value of AHI = 8, the sensitivity and specificity were 93% and 71.4% in discrimination between patients and normal subjects. The decision tree for the discrimination between patients and normal had sensitivity and specificity of 91% and 60%, respectively. Certain obtained nonlinear values correlated significantly with commonly accepted physiological parameters of people suffering from OSA. We developed a predictive model for the presence/severity of OSA using a simple linear equation and additional decision trees with nonlinear features extracted from 3 respiratory recordings. The accuracy of the methodology is high and the findings provide insight to the underlying pathophysiology of the syndrome. Reliable predictions of OSA are possible using linear and nonlinear indices from only 3 respiratory signals during sleep. The proposed models could lead to a better study of the pathophysiology of OSA and facilitate initial evaluation/follow up of suspected patients OSA utilizing a practical low cost methodology. ClinicalTrials.gov NCT01161381.
Nonlinear estimation of parameters in biphasic Arrhenius plots.
Puterman, M L; Hrboticky, N; Innis, S M
1988-05-01
This paper presents a formal procedure for the statistical analysis of data on the thermotropic behavior of membrane-bound enzymes generated using the Arrhenius equation and compares the analysis to several alternatives. Data is modeled by a bent hyperbola. Nonlinear regression is used to obtain estimates and standard errors of the intersection of line segments, defined as the transition temperature, and slopes, defined as energies of activation of the enzyme reaction. The methodology allows formal tests of the adequacy of a biphasic model rather than either a single straight line or a curvilinear model. Examples on data concerning the thermotropic behavior of pig brain synaptosomal acetylcholinesterase are given. The data support the biphasic temperature dependence of this enzyme. The methodology represents a formal procedure for statistical validation of any biphasic data and allows for calculation of all line parameters with estimates of precision.
Statistical inference methods for sparse biological time series data.
Ndukum, Juliet; Fonseca, Luís L; Santos, Helena; Voit, Eberhard O; Datta, Susmita
2011-04-25
Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR) spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values <0.0001). We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures, based on ANOVA likelihood ratio tests, for testing the significance of differences between short time course data under different biological perturbations.
Han, Hyung Joon; Choi, Sae Byeol; Park, Man Sik; Lee, Jin Suk; Kim, Wan Bae; Song, Tae Jin; Choi, Sang Yong
2011-07-01
Single port laparoscopic surgery has come to the forefront of minimally invasive surgery. For those familiar with conventional techniques, however, this type of operation demands a different type of eye/hand coordination and involves unfamiliar working instruments. Herein, the authors describe the learning curve and the clinical outcomes of single port laparoscopic cholecystectomy for 150 consecutive patients with benign gallbladder disease. All patients underwent single port laparoscopic cholecystectomy using a homemade glove port by one of five operators with different levels of experiences of laparoscopic surgery. The learning curve for each operator was fitted using the non-linear ordinary least squares method based on a non-linear regression model. Mean operating time was 77.6 ± 28.5 min. Fourteen patients (6.0%) were converted to conventional laparoscopic cholecystectomy. Complications occurred in 15 patients (10.0%), as follows: bile duct injury (n = 2), surgical site infection (n = 8), seroma (n = 2), and wound pain (n = 3). One operator achieved a learning curve plateau at 61.4 min per procedure after 8.5 cases and his time improved by 95.3 min as compared with initial operation time. Younger surgeons showed significant decreases in mean operation time and achieved stable mean operation times. In particular, younger surgeons showed significant decreases in operation times after 20 cases. Experienced laparoscopic surgeons can safely perform single port laparoscopic cholecystectomy using conventional or angled laparoscopic instruments. The present study shows that an operator can overcome the single port laparoscopic cholecystectomy learning curve in about eight cases.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2014-05-01
Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Bioinactivation: Software for modelling dynamic microbial inactivation.
Garre, Alberto; Fernández, Pablo S; Lindqvist, Roland; Egea, Jose A
2017-03-01
This contribution presents the bioinactivation software, which implements functions for the modelling of isothermal and non-isothermal microbial inactivation. This software offers features such as user-friendliness, modelling of dynamic conditions, possibility to choose the fitting algorithm and generation of prediction intervals. The software is offered in two different formats: Bioinactivation core and Bioinactivation SE. Bioinactivation core is a package for the R programming language, which includes features for the generation of predictions and for the fitting of models to inactivation experiments using non-linear regression or a Markov Chain Monte Carlo algorithm (MCMC). The calculations are based on inactivation models common in academia and industry (Bigelow, Peleg, Mafart and Geeraerd). Bioinactivation SE supplies a user-friendly interface to selected functions of Bioinactivation core, namely the model fitting of non-isothermal experiments and the generation of prediction intervals. The capabilities of bioinactivation are presented in this paper through a case study, modelling the non-isothermal inactivation of Bacillus sporothermodurans. This study has provided a full characterization of the response of the bacteria to dynamic temperature conditions, including confidence intervals for the model parameters and a prediction interval of the survivor curve. We conclude that the MCMC algorithm produces a better characterization of the biological uncertainty and variability than non-linear regression. The bioinactivation software can be relevant to the food and pharmaceutical industry, as well as to regulatory agencies, as part of a (quantitative) microbial risk assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lee, Cameron C; Sheridan, Scott C
2018-07-01
Temperature-mortality relationships are nonlinear, time-lagged, and can vary depending on the time of year and geographic location, all of which limits the applicability of simple regression models in describing these associations. This research demonstrates the utility of an alternative method for modeling such complex relationships that has gained recent traction in other environmental fields: nonlinear autoregressive models with exogenous input (NARX models). All-cause mortality data and multiple temperature-based data sets were gathered from 41 different US cities, for the period 1975-2010, and subjected to ensemble NARX modeling. Models generally performed better in larger cities and during the winter season. Across the US, median absolute percentage errors were 10% (ranging from 4% to 15% in various cities), the average improvement in the r-squared over that of a simple persistence model was 17% (6-24%), and the hit rate for modeling spike days in mortality (>80th percentile) was 54% (34-71%). Mortality responded acutely to hot summer days, peaking at 0-2 days of lag before dropping precipitously, and there was an extended mortality response to cold winter days, peaking at 2-4 days of lag and dropping slowly and continuing for multiple weeks. Spring and autumn showed both of the aforementioned temperature-mortality relationships, but generally to a lesser magnitude than what was seen in summer or winter. When compared to distributed lag nonlinear models, NARX model output was nearly identical. These results highlight the applicability of NARX models for use in modeling complex and time-dependent relationships for various applications in epidemiology and environmental sciences. Copyright © 2018 Elsevier Inc. All rights reserved.
Nonlinear-regression flow model of the Gulf Coast aquifer systems in the south-central United States
Kuiper, L.K.
1994-01-01
A multiple-regression methodology was used to help answer questions concerning model reliability, and to calibrate a time-dependent variable-density ground-water flow model of the gulf coast aquifer systems in the south-central United States. More than 40 regression models with 2 to 31 regressions parameters are used and detailed results are presented for 12 of the models. More than 3,000 values for grid-element volume-averaged head and hydraulic conductivity are used for the regression model observations. Calculated prediction interval half widths, though perhaps inaccurate due to a lack of normality of the residuals, are the smallest for models with only four regression parameters. In addition, the root-mean weighted residual decreases very little with an increase in the number of regression parameters. The various models showed considerable overlap between the prediction inter- vals for shallow head and hydraulic conductivity. Approximate 95-percent prediction interval half widths for volume-averaged freshwater head exceed 108 feet; for volume-averaged base 10 logarithm hydraulic conductivity, they exceed 0.89. All of the models are unreliable for the prediction of head and ground-water flow in the deeper parts of the aquifer systems, including the amount of flow coming from the underlying geopressured zone. Truncating the domain of solution of one model to exclude that part of the system having a ground-water density greater than 1.005 grams per cubic centimeter or to exclude that part of the systems below a depth of 3,000 feet, and setting the density to that of freshwater does not appreciably change the results for head and ground-water flow, except for locations close to the truncation surface.
A hysteretic model considering Stribeck effect for small-scale magnetorheological damper
NASA Astrophysics Data System (ADS)
Zhao, Yu-Liang; Xu, Zhao-Dong
2018-06-01
Magnetorheological (MR) damper is an ideal semi-active control device for vibration suppression. The mechanical properties of this type of devices show strong nonlinear characteristics, especially the performance of the small-scale dampers. Therefore, developing an ideal model that can accurately describe the nonlinearity of such device is crucial to control design. In this paper, the dynamic characteristics of a small-scale MR damper developed by our research group is tested, and the Stribeck effect is observed in the low velocity region. Then, an improved model based on sigmoid model is proposed to describe this Stribeck effect observed in the experiment. After that, the parameters of this model are identified by genetic algorithms, and the mathematical relationship between these parameters and the input current, excitation frequency and amplitude is regressed. Finally, the predicted forces of the proposed model are validated with the experimental data. The results show that this model can well predict the mechanical properties of the small-scale damper, especially the Stribeck effect in the low velocity region.
J. Michael Bowker; D. Murphy; H. Ken Cordell; Donald B.K. English; J.C. Bergstrom; C.M. Starbuck; C.J. Betz; G.T. Green
2006-01-01
This paper explores the influence of demographic and spatial variables on individual participation and consumption of wildland area recreation. Data from the National Survey on Recreation and the Environment are combined with geographical information systembased distance measures to develop nonlinear regression models used to predict both participation and the number...
Using twig diameters to estimate browse utilization on three shrub species in southeastern Montana
Mark A. Rumble
1987-01-01
Browse utilization estimates based on twig length and twig weight were compared for skunkbush sumac, wax currant, and chokecherry. Linear regression analysis was valid for twig length data; twig weight equations are nonlinear. Estimates of twig weight are more accurate. Problems encountered during development of a utilization model are discussed.
Long, Yi; Du, Zhi-Jiang; Chen, Chao-Feng; Dong, Wei; Wang, Wei-Dong
2017-07-01
The most important step for lower extremity exoskeleton is to infer human motion intent (HMI), which contributes to achieve human exoskeleton collaboration. Since the user is in the control loop, the relationship between human robot interaction (HRI) information and HMI is nonlinear and complicated, which is difficult to be modeled by using mathematical approaches. The nonlinear approximation can be learned by using machine learning approaches. Gaussian Process (GP) regression is suitable for high-dimensional and small-sample nonlinear regression problems. GP regression is restrictive for large data sets due to its computation complexity. In this paper, an online sparse GP algorithm is constructed to learn the HMI. The original training dataset is collected when the user wears the exoskeleton system with friction compensation to perform unconstrained movement as far as possible. The dataset has two kinds of data, i.e., (1) physical HRI, which is collected by torque sensors placed at the interaction cuffs for the active joints, i.e., knee joints; (2) joint angular position, which is measured by optical position sensors. To reduce the computation complexity of GP, grey relational analysis (GRA) is utilized to specify the original dataset and provide the final training dataset. Those hyper-parameters are optimized offline by maximizing marginal likelihood and will be applied into online GP regression algorithm. The HMI, i.e., angular position of human joints, will be regarded as the reference trajectory for the mechanical legs. To verify the effectiveness of the proposed algorithm, experiments are performed on a subject at a natural speed. The experimental results show the HMI can be obtained in real time, which can be extended and employed in the similar exoskeleton systems.
Maloney, Kelly O.; Schmid, Matthias; Weller, Donald E.
2012-01-01
Issues with ecological data (e.g. non-normality of errors, nonlinear relationships and autocorrelation of variables) and modelling (e.g. overfitting, variable selection and prediction) complicate regression analyses in ecology. Flexible models, such as generalized additive models (GAMs), can address data issues, and machine learning techniques (e.g. gradient boosting) can help resolve modelling issues. Gradient boosted GAMs do both. Here, we illustrate the advantages of this technique using data on benthic macroinvertebrates and fish from 1573 small streams in Maryland, USA.
Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco
2017-10-01
The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.
Analysis of flight data from a High-Incidence Research Model by system identification methods
NASA Technical Reports Server (NTRS)
Batterson, James G.; Klein, Vladislav
1989-01-01
Data partitioning and modified stepwise regression were applied to recorded flight data from a Royal Aerospace Establishment high incidence research model. An aerodynamic model structure and corresponding stability and control derivatives were determined for angles of attack between 18 and 30 deg. Several nonlinearities in angles of attack and sideslip as well as a unique roll-dominated set of lateral modes were found. All flight estimated values were compared to available wind tunnel measurements.
Penalized gaussian process regression and classification for high-dimensional nonlinear data.
Yi, G; Shi, J Q; Choi, T
2011-12-01
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.
A Robust Bayesian Random Effects Model for Nonlinear Calibration Problems
Fong, Y.; Wakefield, J.; De Rosa, S.; Frahm, N.
2013-01-01
Summary In the context of a bioassay or an immunoassay, calibration means fitting a curve, usually nonlinear, through the observations collected on a set of samples containing known concentrations of a target substance, and then using the fitted curve and observations collected on samples of interest to predict the concentrations of the target substance in these samples. Recent technological advances have greatly improved our ability to quantify minute amounts of substance from a tiny volume of biological sample. This has in turn led to a need to improve statistical methods for calibration. In this paper, we focus on developing calibration methods robust to dependent outliers. We introduce a novel normal mixture model with dependent error terms to model the experimental noise. In addition, we propose a re-parameterization of the five parameter logistic nonlinear regression model that allows us to better incorporate prior information. We examine the performance of our methods with simulation studies and show that they lead to a substantial increase in performance measured in terms of mean squared error of estimation and a measure of the average prediction accuracy. A real data example from the HIV Vaccine Trials Network Laboratory is used to illustrate the methods. PMID:22551415
NASA Astrophysics Data System (ADS)
Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.
2018-07-01
In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship.
Yoneoka, Daisuke; Henmi, Masayuki
2017-11-30
Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts. Copyright © 2017 John Wiley & Sons, Ltd.
A quadratic regression modelling on paddy production in the area of Perlis
NASA Astrophysics Data System (ADS)
Goh, Aizat Hanis Annas; Ali, Zalila; Nor, Norlida Mohd; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2017-08-01
Polynomial regression models are useful in situations in which the relationship between a response variable and predictor variables is curvilinear. Polynomial regression fits the nonlinear relationship into a least squares linear regression model by decomposing the predictor variables into a kth order polynomial. The polynomial order determines the number of inflexions on the curvilinear fitted line. A second order polynomial forms a quadratic expression (parabolic curve) with either a single maximum or minimum, a third order polynomial forms a cubic expression with both a relative maximum and a minimum. This study used paddy data in the area of Perlis to model paddy production based on paddy cultivation characteristics and environmental characteristics. The results indicated that a quadratic regression model best fits the data and paddy production is affected by urea fertilizer application and the interaction between amount of average rainfall and percentage of area defected by pest and disease. Urea fertilizer application has a quadratic effect in the model which indicated that if the number of days of urea fertilizer application increased, paddy production is expected to decrease until it achieved a minimum value and paddy production is expected to increase at higher number of days of urea application. The decrease in paddy production with an increased in rainfall is greater, the higher the percentage of area defected by pest and disease.
A new linear least squares method for T1 estimation from SPGR signals with multiple TRs
NASA Astrophysics Data System (ADS)
Chang, Lin-Ching; Koay, Cheng Guan; Basser, Peter J.; Pierpaoli, Carlo
2009-02-01
The longitudinal relaxation time, T1, can be estimated from two or more spoiled gradient recalled echo x (SPGR) images with two or more flip angles and one or more repetition times (TRs). The function relating signal intensity and the parameters are nonlinear; T1 maps can be computed from SPGR signals using nonlinear least squares regression. A widely-used linear method transforms the nonlinear model by assuming a fixed TR in SPGR images. This constraint is not desirable since multiple TRs are a clinically practical way to reduce the total acquisition time, to satisfy the required resolution, and/or to combine SPGR data acquired at different times. A new linear least squares method is proposed using the first order Taylor expansion. Monte Carlo simulations of SPGR experiments are used to evaluate the accuracy and precision of the estimated T1 from the proposed linear and the nonlinear methods. We show that the new linear least squares method provides T1 estimates comparable in both precision and accuracy to those from the nonlinear method, allowing multiple TRs and reducing computation time significantly.
Hartzell, S.; Leeds, A.; Frankel, A.; Williams, R.A.; Odum, J.; Stephenson, W.; Silva, W.
2002-01-01
The Seattle fault poses a significant seismic hazard to the city of Seattle, Washington. A hybrid, low-frequency, high-frequency method is used to calculate broadband (0-20 Hz) ground-motion time histories for a M 6.5 earthquake on the Seattle fault. Low frequencies (1 Hz) are calculated by a stochastic method that uses a fractal subevent size distribution to give an ω-2 displacement spectrum. Time histories are calculated for a grid of stations and then corrected for the local site response using a classification scheme based on the surficial geology. Average shear-wave velocity profiles are developed for six surficial geologic units: artificial fill, modified land, Esperance sand, Lawton clay, till, and Tertiary sandstone. These profiles together with other soil parameters are used to compare linear, equivalent-linear, and nonlinear predictions of ground motion in the frequency band 0-15 Hz. Linear site-response corrections are found to yield unreasonably large ground motions. Equivalent-linear and nonlinear calculations give peak values similar to the 1994 Northridge, California, earthquake and those predicted by regression relationships. Ground-motion variance is estimated for (1) randomization of the velocity profiles, (2) variation in source parameters, and (3) choice of nonlinear model. Within the limits of the models tested, the results are found to be most sensitive to the nonlinear model and soil parameters, notably the over consolidation ratio.
Concentration data and dimensionality in groundwater models: evaluation using inverse modelling
Barlebo, H.C.; Hill, M.C.; Rosbjerg, D.; Jensen, K.H.
1998-01-01
A three-dimensional inverse groundwater flow and transport model that fits hydraulic-head and concentration data simultaneously using nonlinear regression is presented and applied to a layered sand and silt groundwater system beneath the Grindsted Landfill in Denmark. The aquifer is composed of rather homogeneous hydrogeologic layers. Two issues common to groundwater flow and transport modelling are investigated: 1) The accuracy of simulated concentrations in the case of calibration with head data alone; and 2) The advantages and disadvantages of using a two-dimensional cross-sectional model instead of a three-dimensional model to simulate contaminant transport when the source is at the land surface. Results show that using only hydraulic heads in the nonlinear regression produces a simulated plume that is profoundly different from what is obtained in a calibration using both hydraulic-head and concentration data. The present study provides a well-documented example of the differences that can occur. Representing the system as a two-dimensional cross-section obviously omits some of the system dynamics. It was, however, possible to obtain a simulated plume cross-section that matched the actual plume cross-section well. The two-dimensional model execution times were about a seventh of those for the three-dimensional model, but some difficulties were encountered in representing the spatially variable source concentrations and less precise simulated concentrations were calculated by the two-dimensional model compared to the three-dimensional model. Summed up, the present study indicates that three dimensional modelling using both hydraulic heads and concentrations in the calibration should be preferred in the considered type of transport studies.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.
Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G
2007-08-01
A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
Bornstein, Marc H.; Putnick, Diane L.
2018-01-01
We studied multiple parenting cognitions and practices in European American mothers (N = 262) who ranged in age from 15 to 47 years. All were first-time parents of 20-month-old children. Some age effects were zero; others were linear or nonlinear. Nonlinear age effects determined by spline regression showed significant associations to a “knot” age (~30 years) with little or no association afterward. For parenting cognitions and practices that are age-sensitive, a two-phase model of parental development is proposed. These findings stress the importance of considering maternal chronological age as a factor in developmental study. PMID:17605519
Møller, Anne; Reventlow, Susanne; Hansen, Åse Marie; Andersen, Lars L; Siersma, Volkert; Lund, Rikke; Avlund, Kirsten; Andersen, Johan Hviid; Mortensen, Ole Steen
2015-01-01
Objectives Our aim was to study associations between physical exposures throughout working life and physical function measured as chair-rise performance in midlife. Methods The Copenhagen Aging and Midlife Biobank (CAMB) provided data about employment and measures of physical function. Individual job histories were assigned exposures from a job exposure matrix. Exposures were standardised to ton-years (lifting 1000 kg each day in 1 year), stand-years (standing/walking for 6 h each day in 1 year) and kneel-years (kneeling for 1 h each day in 1 year). The associations between exposure-years and chair-rise performance (number of chair-rises in 30 s) were analysed in multivariate linear and non-linear regression models adjusted for covariates. Results Mean age among the 5095 participants was 59 years in both genders, and, on average, men achieved 21.58 (SD=5.60) and women 20.38 (SD=5.33) chair-rises in 30 s. Physical exposures were associated with poorer chair-rise performance in both men and women, however, only associations between lifting and standing/walking and chair-rise remained statistically significant among men in the final model. Spline regression analyses showed non-linear associations and confirmed the findings. Conclusions Higher physical exposure throughout working life is associated with slightly poorer chair-rise performance. The associations between exposure and outcome were non-linear. PMID:26537502
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Gaussian Process Regression for Uncertainty Estimation on Ecosystem Data
NASA Astrophysics Data System (ADS)
Menzer, O.; Moffat, A.; Lasslop, G.; Reichstein, M.
2011-12-01
The flow of carbon between terrestrial ecosystems and the atmosphere is mainly driven by nonlinear, complex and time-lagged processes. Understanding the associated ecosystem responses and climatic feedbacks is a key challenge regarding climate change questions such as increasing atmospheric CO2 levels. Usually, the underlying relationships are implemented in models as prescribed functions which interlink numerous meteorological, radiative and gas exchange variables. In contrast, supervised Machine Learning algorithms, such as Artificial Neural Networks or Gaussian Processes, allow for an insight into the relationships directly from a data perspective. Micrometeorological, high resolution measurements at flux towers of the FLUXNET observational network are an essential tool for obtaining quantifications of the ecosystem variables, as they continuously record e.g. CO2 exchange, solar radiation and air temperature. In order to facilitate the investigation of the interactions and feedbacks between these variables, several challenging data properties need to be taken into account: noisy, multidimensional and incomplete (Moffat, Accepted). The task of estimating uncertainties in such micrometeorological measurements can be addressed by Gaussian Processes (GPs), a modern nonparametric method for nonlinear regression. The GP approach has recently been shown to be a powerful modeling tool, regardless of the input dimensionality, the degree of nonlinearity and the noise level (Rasmussen and Williams, 2006). Heteroscedastic Gaussian Processes (HGPs) are a specialized GP method for data with a varying, inhomogeneous noise variance (Goldberg et al., 1998; Kersting et al., 2007), as usually observed in CO2 flux measurements (Richardson et al., 2006). Here, we showed by an evaluation of the HGP performance in several artificial experiments and a comparison to existing nonlinear regression methods, that their outstanding ability is to capture measurement noise levels, concurrently providing reasonable data fits under relatively few assumptions. On the basis of incomplete, half-hourly measured ecosystem data, a HGP was trained to model NEP (Net Ecosystem Production), only with the drivers PPFD (Photosynthetic Photon Flux Density) and Air Temperature. Time information was added to account for the autocorrelation in the flux measurements. Provided with a gap-filled, meteorological time series, NEP and the corresponding random error estimates can then be predicted empirically at high temporal resolution. We report uncertainties in annual sums of CO2 exchange at two flux tower sites in Hainich, Germany and Hesse, France. Similar noise patterns, but different magnitudes between sites were detected, with annual random error estimates of +/- 14.1 gCm^-2yr^-1 and +/- 23.5 gCm^-2yr^-1, respectively, for the year 2001. Existing models calculate uncertainties by evaluating the standard deviation of the model residuals. A comparison to the methods of Reichstein et al. (2005) and Lasslop et al. (2008) showed confidence both in the predictive uncertainties and the annual sums modeled with the HGP approach.
Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo
2012-11-01
For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability.
Dose-response-a challenge for allelopathy?
Belz, Regina G; Hurle, Karl; Duke, Stephen O
2005-04-01
The response of an organism to a chemical depends, among other things, on the dose. Nonlinear dose-response relationships occur across a broad range of research fields, and are a well established tool to describe the basic mechanisms of phytotoxicity. The responses of plants to allelochemicals as biosynthesized phytotoxins, relate as well to nonlinearity and, thus, allelopathic effects can be adequately quantified by nonlinear mathematical modeling. The current paper applies the concept of nonlinearity to assorted aspects of allelopathy within several bioassays and reveals their analysis by nonlinear regression models. Procedures for a valid comparison of effective doses between different allelopathic interactions are presented for both, inhibitory and stimulatory effects. The dose-response applications measure and compare the responses produced by pure allelochemicals [scopoletin (7-hydroxy-6-methoxy-2H-1-benzopyran-2-one); DIBOA (2,4-dihydroxy-2H-1,4-benzoxaxin-3(4H)-one); BOA (benzoxazolin-2(3H)-one); MBOA (6-methoxy-benzoxazolin-2(3H)-one)], involved in allelopathy of grain crops, to demonstrate how some general principles of dose responses also relate to allelopathy. Hereupon, dose-response applications with living donor plants demonstrate the validity of these principles for density-dependent phytotoxicity of allelochemicals produced and released by living plants (Avena sativa L., Secale cereale L., Triticum L. spp.), and reveal the use of such experiments for initial considerations about basic principles of allelopathy. Results confirm that nonlinearity applies to allelopathy, and the study of allelopathic effects in dose-response experiments allows for new and challenging insights into allelopathic interactions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari
2009-11-15
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R{sup 2} were used to evaluate performancemore » of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R{sup 2} confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.« less
Predicting Madura cattle growth curve using non-linear model
NASA Astrophysics Data System (ADS)
Widyas, N.; Prastowo, S.; Widi, T. S. M.; Baliarti, E.
2018-03-01
Madura cattle is Indonesian native. It is a composite breed that has undergone hundreds of years of selection and domestication to reach nowadays remarkable uniformity. Crossbreeding has reached the isle of Madura and the Madrasin, a cross between Madura cows and Limousine semen emerged. This paper aimed to compare the growth curve between Madrasin and one type of pure Madura cows, the common Madura cattle (Madura) using non-linear models. Madura cattles are kept traditionally thus reliable records are hardly available. Data were collected from small holder farmers in Madura. Cows from different age classes (<6 months, 6-12 months, 1-2years, 2-3years, 3-5years and >5years) were observed, and body measurements (chest girth, body length and wither height) were taken. In total 63 Madura and 120 Madrasin records obtained. Linear model was built with cattle sub-populations and age as explanatory variables. Body weights were estimated based on the chest girth. Growth curves were built using logistic regression. Results showed that within the same age, Madrasin has significantly larger body compared to Madura (p<0.05). The logistic models fit better for Madura and Madrasin cattle data; with the estimated MSE for these models were 39.09 and 759.28 with prediction accuracy of 99 and 92% for Madura and Madrasin, respectively. Prediction of growth curve using logistic regression model performed well in both types of Madura cattle. However, attempts to administer accurate data on Madura cattle are necessary to better characterize and study these cattle.
A simplified competition data analysis for radioligand specific activity determination.
Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A
1990-01-01
Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian
2015-01-01
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213
MEASUREMENT OF WIND SPEED FROM COOLING LAKE THERMAL IMAGERY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, A; Robert Kurzeja, R; Eliel Villa-Aleman, E
2009-01-20
The Savannah River National Laboratory (SRNL) collected thermal imagery and ground truth data at two commercial power plant cooling lakes to investigate the applicability of laboratory empirical correlations between surface heat flux and wind speed, and statistics derived from thermal imagery. SRNL demonstrated in a previous paper [1] that a linear relationship exists between the standard deviation of image temperature and surface heat flux. In this paper, SRNL will show that the skewness of the temperature distribution derived from cooling lake thermal images correlates with instantaneous wind speed measured at the same location. SRNL collected thermal imagery, surface meteorology andmore » water temperatures from helicopters and boats at the Comanche Peak and H. B. Robinson nuclear power plant cooling lakes. SRNL found that decreasing skewness correlated with increasing wind speed, as was the case for the laboratory experiments. Simple linear and orthogonal regression models both explained about 50% of the variance in the skewness - wind speed plots. A nonlinear (logistic) regression model produced a better fit to the data, apparently because the thermal convection and resulting skewness are related to wind speed in a highly nonlinear way in nearly calm and in windy conditions.« less
Time series trends of the safety effects of pavement resurfacing.
Park, Juneyoung; Abdel-Aty, Mohamed; Wang, Jung-Han
2017-04-01
This study evaluated the safety performance of pavement resurfacing projects on urban arterials in Florida using the observational before and after approaches. The safety effects of pavement resurfacing were quantified in the crash modification factors (CMFs) and estimated based on different ranges of heavy vehicle traffic volume and time changes for different severity levels. In order to evaluate the variation of CMFs over time, crash modification functions (CMFunctions) were developed using nonlinear regression and time series models. The results showed that pavement resurfacing projects decrease crash frequency and are found to be more safety effective to reduce severe crashes in general. Moreover, the results of the general relationship between the safety effects and time changes indicated that the CMFs increase over time after the resurfacing treatment. It was also found that pavement resurfacing projects for the urban roadways with higher heavy vehicle volume rate are more safety effective than the roadways with lower heavy vehicle volume rate. Based on the exploration and comparison of the developed CMFucntions, the seasonal autoregressive integrated moving average (SARIMA) and exponential functional form of the nonlinear regression models can be utilized to identify the trend of CMFs over time. Copyright © 2017 Elsevier Ltd. All rights reserved.
Prediction of sweetness and amino acid content in soybean crops from hyperspectral imagery
NASA Astrophysics Data System (ADS)
Monteiro, Sildomar Takahashi; Minekawa, Yohei; Kosugi, Yukio; Akazawa, Tsuneya; Oda, Kunio
Hyperspectral image data provides a powerful tool for non-destructive crop analysis. This paper investigates a hyperspectral image data-processing method to predict the sweetness and amino acid content of soybean crops. Regression models based on artificial neural networks were developed in order to calculate the level of sucrose, glucose, fructose, and nitrogen concentrations, which can be related to the sweetness and amino acid content of vegetables. A performance analysis was conducted comparing regression models obtained using different preprocessing methods, namely, raw reflectance, second derivative, and principal components analysis. This method is demonstrated using high-resolution hyperspectral data of wavelengths ranging from the visible to the near infrared acquired from an experimental field of green vegetable soybeans. The best predictions were achieved using a nonlinear regression model of the second derivative transformed dataset. Glucose could be predicted with greater accuracy, followed by sucrose, fructose and nitrogen. The proposed method provides the possibility to provide relatively accurate maps predicting the chemical content of soybean crop fields.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Pengfei; Jing, Qi
2017-02-01
An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Regression of non-linear coupling of noise in LIGO detectors
NASA Astrophysics Data System (ADS)
Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.
2018-03-01
In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Hill, Mary Catherine
1992-01-01
This report documents a new version of the U.S. Geological Survey modular, three-dimensional, finite-difference, ground-water flow model (MODFLOW) which, with the new Parameter-Estimation Package that also is documented in this report, can be used to estimate parameters by nonlinear regression. The new version of MODFLOW is called MODFLOWP (pronounced MOD-FLOW*P), and functions nearly identically to MODFLOW when the ParameterEstimation Package is not used. Parameters are estimated by minimizing a weighted least-squares objective function by the modified Gauss-Newton method or by a conjugate-direction method. Parameters used to calculate the following MODFLOW model inputs can be estimated: Transmissivity and storage coefficient of confined layers; hydraulic conductivity and specific yield of unconfined layers; vertical leakance; vertical anisotropy (used to calculate vertical leakance); horizontal anisotropy; hydraulic conductance of the River, Streamflow-Routing, General-Head Boundary, and Drain Packages; areal recharge rates; maximum evapotranspiration; pumpage rates; and the hydraulic head at constant-head boundaries. Any spatial variation in parameters can be defined by the user. Data used to estimate parameters can include existing independent estimates of parameter values, observed hydraulic heads or temporal changes in hydraulic heads, and observed gains and losses along head-dependent boundaries (such as streams). Model output includes statistics for analyzing the parameter estimates and the model; these statistics can be used to quantify the reliability of the resulting model, to suggest changes in model construction, and to compare results of models constructed in different ways.
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets.
Chen, Jie-Hao; Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%.
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets
Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%. PMID:29209363
Predicting the occurrence of wildfires with binary structured additive regression models.
Ríos-Pena, Laura; Kneib, Thomas; Cadarso-Suárez, Carmen; Marey-Pérez, Manuel
2017-02-01
Wildfires are one of the main environmental problems facing societies today, and in the case of Galicia (north-west Spain), they are the main cause of forest destruction. This paper used binary structured additive regression (STAR) for modelling the occurrence of wildfires in Galicia. Binary STAR models are a recent contribution to the classical logistic regression and binary generalized additive models. Their main advantage lies in their flexibility for modelling non-linear effects, while simultaneously incorporating spatial and temporal variables directly, thereby making it possible to reveal possible relationships among the variables considered. The results showed that the occurrence of wildfires depends on many covariates which display variable behaviour across space and time, and which largely determine the likelihood of ignition of a fire. The joint possibility of working on spatial scales with a resolution of 1 × 1 km cells and mapping predictions in a colour range makes STAR models a useful tool for plotting and predicting wildfire occurrence. Lastly, it will facilitate the development of fire behaviour models, which can be invaluable when it comes to drawing up fire-prevention and firefighting plans. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bittencourt, Natalia F N; Ocarino, Juliana M; Mendonça, Luciana D M; Hewett, Timothy E; Fonseca, Sergio T
2012-12-01
Cross-sectional. To investigate predictors of increased frontal plane knee projection angle (FPKPA) in athletes. The underlying mechanisms that lead to increased FPKPA are likely multifactorial and depend on how the musculoskeletal system adapts to the possible interactions between its distal and proximal segments. Bivariate and linear analyses traditionally employed to analyze the occurrence of increased FPKPA are not sufficiently robust to capture complex relationships among predictors. The investigation of nonlinear interactions among biomechanical factors is necessary to further our understanding of the interdependence of lower-limb segments and resultant dynamic knee alignment. The FPKPA was assessed in 101 athletes during a single-leg squat and in 72 athletes at the moment of landing from a jump. The investigated predictors were sex, hip abductor isometric torque, passive range of motion (ROM) of hip internal rotation (IR), and shank-forefoot alignment. Classification and regression trees were used to investigate nonlinear interactions among predictors and their influence on the occurrence of increased FPKPA. During single-leg squatting, the occurrence of high FPKPA was predicted by the interaction between hip abductor isometric torque and passive hip IR ROM. At the moment of landing, the shank-forefoot alignment, abductor isometric torque, and passive hip IR ROM were predictors of high FPKPA. In addition, the classification and regression trees established cutoff points that could be used in clinical practice to identify athletes who are at potential risk for excessive FPKPA. The models captured nonlinear interactions between hip abductor isometric torque, passive hip IR ROM, and shank-forefoot alignment.
Røislien, Jo; Søvik, Signe; Eken, Torsten
2018-01-01
Trauma is a leading global cause of death, and predicting the burden of trauma admissions is vital for good planning of trauma care. Seasonality in trauma admissions has been found in several studies. Seasonal fluctuations in daylight hours, temperature and weather affect social and cultural practices but also individual neuroendocrine rhythms that may ultimately modify behaviour and potentially predispose to trauma. The aim of the present study was to explore to what extent the observed seasonality in daily trauma admissions could be explained by changes in daylight and weather variables throughout the year. Retrospective registry study on trauma admissions in the 10-year period 2001-2010 at Oslo University Hospital, Ullevål, Norway, where the amount of daylight varies from less than 6 hours to almost 19 hours per day throughout the year. Daily number of admissions was analysed by fitting non-linear Poisson time series regression models, simultaneously adjusting for several layers of temporal patterns, including a non-linear long-term trend and both seasonal and weekly cyclic effects. Five daylight and weather variables were explored, including hours of daylight and amount of precipitation. Models were compared using Akaike's Information Criterion (AIC). A regression model including daylight and weather variables significantly outperformed a traditional seasonality model in terms of AIC. A cyclic week effect was significant in all models. Daylight and weather variables are better predictors of seasonality in daily trauma admissions than mere information on day-of-year.
ERIC Educational Resources Information Center
Hovardas, Tasos
2016-01-01
Although ecological systems at varying scales involve non-linear interactions, learners insist thinking in a linear fashion when they deal with ecological phenomena. The overall objective of the present contribution was to propose a hypothetical learning progression for developing non-linear reasoning in prey-predator systems and to provide…
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution
NASA Astrophysics Data System (ADS)
Baldacchino, Tara; Worden, Keith; Rowson, Jennifer
2017-02-01
A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.
NASA Astrophysics Data System (ADS)
Mansor, Zakwan; Zakaria, Mohd Zakimi; Nor, Azuwir Mohd; Saad, Mohd Sazli; Ahmad, Robiah; Jamaluddin, Hishamuddin
2017-09-01
This paper presents the black-box modelling of palm oil biodiesel engine (POB) using multi-objective optimization differential evolution (MOODE) algorithm. Two objective functions are considered in the algorithm for optimization; minimizing the number of term of a model structure and minimizing the mean square error between actual and predicted outputs. The mathematical model used in this study to represent the POB system is nonlinear auto-regressive moving average with exogenous input (NARMAX) model. Finally, model validity tests are applied in order to validate the possible models that was obtained from MOODE algorithm and lead to select an optimal model.
NASA Astrophysics Data System (ADS)
Seo, H.; Kwon, Y. O.; Joyce, T. M.
2016-02-01
A remarkably strong nonlinear behavior of the atmospheric circulation response to North Atlantic SST anomalies (SSTA) is revealed from a set of large-ensemble, high-resolution, and hemispheric-scale Weather Research and Forecasting (WRF) model simulations. The model is forced with the SSTA associated with meridional shift of the Gulf Stream (GS) path, constructed from a lag regression of the winter SST on a GS Index from observation. Analysis of the systematic set of experiments with SSTAs of varied amplitudes and switched signs representing various GS-shift scenarios provides unique insights into mechanism for emergence and evolution of transient and equilibrium response of atmospheric circulation to extratropical SSTA. Results show that, independent of sign of the SSTA, the equilibrium response is characterized by an anomalous trough over the North Atlantic Ocean and the Western Europe concurrent with enhanced storm track, increased rainfall, and reduced blocking days. To the north of the anomalous low, an anomalous ridge emerges over the Greenland, Iceland, and Norwegian Seas accompanied by weakened storm track, reduced rainfall and increased blocking days. This nonlinear component of the total response dominates the weak and oppositely signed linear response that is directly forced by the SSTA, yielding an anomalous ridge (trough) downstream of the warm (cold) SSTA. The amplitude of the linear response is proportional to that of the SSTA, but this is masked by the overwhelmingly strong nonlinear behavior showing no clear correspondence to the SSTA amplitude. The nonlinear pattern emerges 3-4 weeks after the model initialization in November and reaches its first peak amplitude in December/January. It appears that altered baroclinic wave activity due to the GS SSTA in November lead to low-frequency height responses in December/January through transient eddy vorticity flux convergence.
Confidence limits for data mining models of options prices
NASA Astrophysics Data System (ADS)
Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.
2004-12-01
Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].
Vanegas, Jairo; Vásquez, Fabián
Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Revisiting tests for neglected nonlinearity using artificial neural networks.
Cho, Jin Seo; Ishida, Isao; White, Halbert
2011-05-01
Tests for regression neglected nonlinearity based on artificial neural networks (ANNs) have so far been studied by separately analyzing the two ways in which the null of regression linearity can hold. This implies that the asymptotic behavior of general ANN-based tests for neglected nonlinearity is still an open question. Here we analyze a convenient ANN-based quasi-likelihood ratio statistic for testing neglected nonlinearity, paying careful attention to both components of the null. We derive the asymptotic null distribution under each component separately and analyze their interaction. Somewhat remarkably, it turns out that the previously known asymptotic null distribution for the type 1 case still applies, but under somewhat stronger conditions than previously recognized. We present Monte Carlo experiments corroborating our theoretical results and showing that standard methods can yield misleading inference when our new, stronger regularity conditions are violated.
Reconstructing latent dynamical noise for better forecasting observables
NASA Astrophysics Data System (ADS)
Hirata, Yoshito
2018-03-01
I propose a method for reconstructing multi-dimensional dynamical noise inspired by the embedding theorem of Muldoon et al. [Dyn. Stab. Syst. 13, 175 (1998)] by regarding multiple predictions as different observables. Then, applying the embedding theorem by Stark et al. [J. Nonlinear Sci. 13, 519 (2003)] for a forced system, I produce time series forecast by supplying the reconstructed past dynamical noise as auxiliary information. I demonstrate the proposed method on toy models driven by auto-regressive models or independent Gaussian noise.
Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B M J
2018-07-01
In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship. Copyright © 2018. Published by Elsevier B.V.
Functional interaction-based nonlinear models with application to multiplatform genomics data.
Davenport, Clemontina A; Maity, Arnab; Baladandayuthapani, Veerabhadran
2018-05-07
Functional regression allows for a scalar response to be dependent on a functional predictor; however, not much work has been done when a scalar exposure that interacts with the functional covariate is introduced. In this paper, we present 2 functional regression models that account for this interaction and propose 2 novel estimation procedures for the parameters in these models. These estimation methods allow for a noisy and/or sparsely observed functional covariate and are easily extended to generalized exponential family responses. We compute standard errors of our estimators, which allows for further statistical inference and hypothesis testing. We compare the performance of the proposed estimators to each other and to one found in the literature via simulation and demonstrate our methods using a real data example. Copyright © 2018 John Wiley & Sons, Ltd.
Flexible nonlinear estimates of the association between height and mental ability in early life.
Murasko, Jason E
2014-01-01
To estimate associations between early-life mental ability and height/height-growth in contemporary US children. Structured additive regression models are used to flexibly estimate the associations between height and mental ability at approximately 24 months of age. The sample is taken from the Early Childhood Longitudinal Study-Birth Cohort, a national study whose target population was children born in the US during 2001. A nonlinear association is indicated between height and mental ability at approximately 24 months of age. There is an increasing association between height and mental ability below the mean value of height, but a flat association thereafter. Annualized growth shows the same nonlinear association to ability when controlling for baseline length at 9 months. Restricted growth at lower values of the height distribution is associated with lower measured mental ability in contemporary US children during the first years of life. Copyright © 2013 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Stankiewicz, N.
1982-01-01
The multiple channel input signal to a soft limiter amplifier as a traveling wave tube is represented as a finite, linear sum of Gaussian functions in the frequency domain. Linear regression is used to fit the channel shapes to a least squares residual error. Distortions in output signal, namely intermodulation products, are produced by the nonlinear gain characteristic of the amplifier and constitute the principal noise analyzed in this study. The signal to noise ratios are calculated for various input powers from saturation to 10 dB below saturation for two specific distributions of channels. A criterion for the truncation of the series expansion of the nonlinear transfer characteristic is given. It is found that he signal to noise ratios are very sensitive to the coefficients used in this expansion. Improper or incorrect truncation of the series leads to ambiguous results in the signal to noise ratios.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Beelders, Theresa; de Beer, Dalene; Kidd, Martin; Joubert, Elizabeth
2018-01-01
Mangiferin, a C-glucosyl xanthone, abundant in mango and honeybush, is increasingly targeted for its bioactive properties and thus to enhance functional properties of food. The thermal degradation kinetics of mangiferin at pH3, 4, 5, 6 and 7 were each modeled at five temperatures ranging between 60 and 140°C. First-order reaction models were fitted to the data using non-linear regression to determine the reaction rate constant at each pH-temperature combination. The reaction rate constant increased with increasing temperature and pH. Comparison of the reaction rate constants at 100°C revealed an exponential relationship between the reaction rate constant and pH. The data for each pH were also modeled with the Arrhenius equation using non-linear and linear regression to determine the activation energy and pre-exponential factor. Activation energies decreased slightly with increasing pH. Finally, a multi-linear model taking into account both temperature and pH was developed for mangiferin degradation. Sterilization (121°C for 4min) of honeybush extracts dissolved at pH4, 5 and 7 did not cause noticeable degradation of mangiferin, although the multi-linear model predicted 34% degradation at pH7. The extract matrix is postulated to exert a protective effect as changes in potential precursor content could not fully explain the stability of mangiferin. Copyright © 2017 Elsevier Ltd. All rights reserved.
Application of Neural Networks to Wind tunnel Data Response Surface Methods
NASA Technical Reports Server (NTRS)
Lo, Ching F.; Zhao, J. L.; DeLoach, Richard
2000-01-01
The integration of nonlinear neural network methods with conventional linear regression techniques is demonstrated for representative wind tunnel force balance data modeling. This work was motivated by a desire to formulate precision intervals for response surfaces produced by neural networks. Applications are demonstrated for representative wind tunnel data acquired at NASA Langley Research Center and the Arnold Engineering Development Center in Tullahoma, TN.
Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.
Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W
2005-01-01
Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.
Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy
NASA Astrophysics Data System (ADS)
Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen
2017-05-01
This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui
2017-01-01
Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun Wei; Huang, Guo H., E-mail: huang@iseis.org; Institute for Energy, Environment and Sustainable Communities, University of Regina, Regina, Saskatchewan, S4S 0A2
2012-06-15
Highlights: Black-Right-Pointing-Pointer Inexact piecewise-linearization-based fuzzy flexible programming is proposed. Black-Right-Pointing-Pointer It's the first application to waste management under multiple complexities. Black-Right-Pointing-Pointer It tackles nonlinear economies-of-scale effects in interval-parameter constraints. Black-Right-Pointing-Pointer It estimates costs more accurately than the linear-regression-based model. Black-Right-Pointing-Pointer Uncertainties are decreased and more satisfactory interval solutions are obtained. - Abstract: To tackle nonlinear economies-of-scale (EOS) effects in interval-parameter constraints for a representative waste management problem, an inexact piecewise-linearization-based fuzzy flexible programming (IPFP) model is developed. In IPFP, interval parameters for waste amounts and transportation/operation costs can be quantified; aspiration levels for net system costs, as well as tolerancemore » intervals for both capacities of waste treatment facilities and waste generation rates can be reflected; and the nonlinear EOS effects transformed from objective function to constraints can be approximated. An interactive algorithm is proposed for solving the IPFP model, which in nature is an interval-parameter mixed-integer quadratically constrained programming model. To demonstrate the IPFP's advantages, two alternative models are developed to compare their performances. One is a conventional linear-regression-based inexact fuzzy programming model (IPFP2) and the other is an IPFP model with all right-hand-sides of fussy constraints being the corresponding interval numbers (IPFP3). The comparison results between IPFP and IPFP2 indicate that the optimized waste amounts would have the similar patterns in both models. However, when dealing with EOS effects in constraints, the IPFP2 may underestimate the net system costs while the IPFP can estimate the costs more accurately. The comparison results between IPFP and IPFP3 indicate that their solutions would be significantly different. The decreased system uncertainties in IPFP's solutions demonstrate its effectiveness for providing more satisfactory interval solutions than IPFP3. Following its first application to waste management, the IPFP can be potentially applied to other environmental problems under multiple complexities.« less
Application of near-infrared spectroscopy in the detection of fat-soluble vitamins in premix feed
NASA Astrophysics Data System (ADS)
Jia, Lian Ping; Tian, Shu Li; Zheng, Xue Cong; Jiao, Peng; Jiang, Xun Peng
2018-02-01
Vitamin is the organic compound and necessary for animal physiological maintenance. The rapid determination of the content of different vitamins in premix feed can help to achieve accurate diets and efficient feeding. Compared with high-performance liquid chromatography and other wet chemical methods, near-infrared spectroscopy is a fast, non-destructive, non-polluting method. 168 samples of premix feed were collected and the contents of vitamin A, vitamin E and vitamin D3 were detected by the standard method. The near-infrared spectra of samples ranging from 10 000 to 4 000 cm-1 were obtained. Partial least squares regression (PLSR) and support vector machine regression (SVMR) were used to construct the quantitative model. The results showed that the RMSEP of PLSR model of vitamin A, vitamin E and vitamin D3 were 0.43×107 IU/kg, 0.09×105 IU/kg and 0.17×107 IU/kg, respectively. The RMSEP of SVMR model was 0.45×107 IU/kg, 0.11×105 IU/kg and 0.18×107 IU/kg. Compared with nonlinear regression method (SVMR), linear regression method (PLSR) is more suitable for the quantitative analysis of vitamins in premix feed.
Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu
2016-01-01
We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054
New Approach To Hour-By-Hour Weather Forecast
NASA Astrophysics Data System (ADS)
Liao, Q. Q.; Wang, B.
2017-12-01
Fine hourly forecast in single station weather forecast is required in many human production and life application situations. Most previous MOS (Model Output Statistics) which used a linear regression model are hard to solve nonlinear natures of the weather prediction and forecast accuracy has not been sufficient at high temporal resolution. This study is to predict the future meteorological elements including temperature, precipitation, relative humidity and wind speed in a local region over a relatively short period of time at hourly level. By means of hour-to-hour NWP (Numeral Weather Prediction)meteorological field from Forcastio (https://darksky.net/dev/docs/forecast) and real-time instrumental observation including 29 stations in Yunnan and 3 stations in Tianjin of China from June to October 2016, predictions are made of the 24-hour hour-by-hour ahead. This study presents an ensemble approach to combine the information of instrumental observation itself and NWP. Use autoregressive-moving-average (ARMA) model to predict future values of the observation time series. Put newest NWP products into the equations derived from the multiple linear regression MOS technique. Handle residual series of MOS outputs with autoregressive (AR) model for the linear property presented in time series. Due to the complexity of non-linear property of atmospheric flow, support vector machine (SVM) is also introduced . Therefore basic data quality control and cross validation makes it able to optimize the model function parameters , and do 24 hours ahead residual reduction with AR/SVM model. Results show that AR model technique is better than corresponding multi-variant MOS regression method especially at the early 4 hours when the predictor is temperature. MOS-AR combined model which is comparable to MOS-SVM model outperform than MOS. Both of their root mean square error and correlation coefficients for 2 m temperature are reduced to 1.6 degree Celsius and 0.91 respectively. The forecast accuracy of 24- hour forecast deviation no more than 2 degree Celsius is 78.75 % for MOS-AR model and 81.23 % for AR model.
Nonlinear analysis of pupillary dynamics.
Onorati, Francesco; Mainardi, Luca Tommaso; Sirca, Fabiola; Russo, Vincenzo; Barbieri, Riccardo
2016-02-01
Pupil size reflects autonomic response to different environmental and behavioral stimuli, and its dynamics have been linked to other autonomic correlates such as cardiac and respiratory rhythms. The aim of this study is to assess the nonlinear characteristics of pupil size of 25 normal subjects who participated in a psychophysiological experimental protocol with four experimental conditions, namely “baseline”, “anger”, “joy”, and “sadness”. Nonlinear measures, such as sample entropy, correlation dimension, and largest Lyapunov exponent, were computed on reconstructed signals of spontaneous fluctuations of pupil dilation. Nonparametric statistical tests were performed on surrogate data to verify that the nonlinear measures are an intrinsic characteristic of the signals. We then developed and applied a piecewise linear regression model to detrended fluctuation analysis (DFA). Two joinpoints and three scaling intervals were identified: slope α0, at slow time scales, represents a persistent nonstationary long-range correlation, whereas α1 and α2, at middle and fast time scales, respectively, represent long-range power-law correlations, similarly to DFA applied to heart rate variability signals. Of the computed complexity measures, α0 showed statistically significant differences among experimental conditions (p<0.001). Our results suggest that (a) pupil size at constant light condition is characterized by nonlinear dynamics, (b) three well-defined and distinct long-memory processes exist at different time scales, and (c) autonomic stimulation is partially reflected in nonlinear dynamics. (c) autonomic stimulation is partially reflected in nonlinear dynamics.
Dose-Response—A Challenge for Allelopathy?
Belz, Regina G.; Hurle, Karl; Duke, Stephen O.
2005-01-01
The response of an organism to a chemical depends, among other things, on the dose. Nonlinear dose-response relationships occur across a broad range of research fields, and are a well established tool to describe the basic mechanisms of phytotoxicity. The responses of plants to allelochemicals as biosynthesized phytotoxins, relate as well to nonlinearity and, thus, allelopathic effects can be adequately quantified by nonlinear mathematical modeling. The current paper applies the concept of nonlinearity to assorted aspects of allelopathy within several bioassays and reveals their analysis by nonlinear regression models. Procedures for a valid comparison of effective doses between different allelopathic interactions are presented for both, inhibitory and stimulatory effects. The dose-response applications measure and compare the responses produced by pure allelochemicals [scopoletin (7-hydroxy-6-methoxy-2H-1-benzopyran-2-one); DIBOA (2,4-dihydroxy-2H-1,4-benzoxaxin-3(4H)-one); BOA (benzoxazolin-2(3H)-one); MBOA (6-methoxy-benzoxazolin-2(3H)-one)], involved in allelopathy of grain crops, to demonstrate how some general principles of dose responses also relate to allelopathy. Hereupon, dose-response applications with living donor plants demonstrate the validity of these principles for density-dependent phytotoxicity of allelochemicals produced and released by living plants (Avena sativa L., Secale cereale L., Triticum L. spp.), and reveal the use of such experiments for initial considerations about basic principles of allelopathy. Results confirm that nonlinearity applies to allelopathy, and the study of allelopathic effects in dose-response experiments allows for new and challenging insights into allelopathic interactions. PMID:19330161
Modeling non-linear growth responses to temperature and hydrology in wetland trees
NASA Astrophysics Data System (ADS)
Keim, R.; Allen, S. T.
2016-12-01
Growth responses of wetland trees to flooding and climate variations are difficult to model because they depend on multiple, apparently interacting factors, but are a critical link in hydrological control of wetland carbon budgets. To more generally understand tree growth to hydrological forcing, we modeled non-linear responses of tree ring growth to flooding and climate at sub-annual time steps, using Vaganov-Shashkin response functions. We calibrated the model to six baldcypress tree-ring chronologies from two hydrologically distinct sites in southern Louisiana, and tested several hypotheses of plasticity in wetlands tree responses to interacting environmental variables. The model outperformed traditional multiple linear regression. More importantly, optimized response parameters were generally similar among sites with varying hydrological conditions, suggesting generality to the functions. Model forms that included interacting responses to multiple forcing factors were more effective than were single response functions, indicating the principle of a single limiting factor is not correct in wetlands and both climatic and hydrological variables must be considered in predicting responses to hydrological or climate change.
Vathsangam, Harshvardhan; Emken, Adar; Schroeder, E. Todd; Spruijt-Metz, Donna; Sukhatme, Gaurav S.
2011-01-01
This paper describes an experimental study in estimating energy expenditure from treadmill walking using a single hip-mounted triaxial inertial sensor comprised of a triaxial accelerometer and a triaxial gyroscope. Typical physical activity characterization using accelerometer generated counts suffers from two drawbacks - imprecison (due to proprietary counts) and incompleteness (due to incomplete movement description). We address these problems in the context of steady state walking by directly estimating energy expenditure with data from a hip-mounted inertial sensor. We represent the cyclic nature of walking with a Fourier transform of sensor streams and show how one can map this representation to energy expenditure (as measured by V O2 consumption, mL/min) using three regression techniques - Least Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR). We perform a comparative analysis of the accuracy of sensor streams in predicting energy expenditure (measured by RMS prediction accuracy). Triaxial information is more accurate than uniaxial information. LSR based approaches are prone to outlier sensitivity and overfitting. Gyroscopic information showed equivalent if not better prediction accuracy as compared to accelerometers. Combining accelerometer and gyroscopic information provided better accuracy than using either sensor alone. We also analyze the best algorithmic approach among linear and nonlinear methods as measured by RMS prediction accuracy and run time. Nonlinear regression methods showed better prediction accuracy but required an order of magnitude of run time. This paper emphasizes the role of probabilistic techniques in conjunction with joint modeling of triaxial accelerations and rotational rates to improve energy expenditure prediction for steady-state treadmill walking. PMID:21690001
Linear theory for filtering nonlinear multiscale systems with model error
Berry, Tyrus; Harlim, John
2014-01-01
In this paper, we study filtering of multiscale dynamical systems with model error arising from limitations in resolving the smaller scale processes. In particular, the analysis assumes the availability of continuous-time noisy observations of all components of the slow variables. Mathematically, this paper presents new results on higher order asymptotic expansion of the first two moments of a conditional measure. In particular, we are interested in the application of filtering multiscale problems in which the conditional distribution is defined over the slow variables, given noisy observation of the slow variables alone. From the mathematical analysis, we learn that for a continuous time linear model with Gaussian noise, there exists a unique choice of parameters in a linear reduced model for the slow variables which gives the optimal filtering when only the slow variables are observed. Moreover, these parameters simultaneously give the optimal equilibrium statistical estimates of the underlying system, and as a consequence they can be estimated offline from the equilibrium statistics of the true signal. By examining a nonlinear test model, we show that the linear theory extends in this non-Gaussian, nonlinear configuration as long as we know the optimal stochastic parametrization and the correct observation model. However, when the stochastic parametrization model is inappropriate, parameters chosen for good filter performance may give poor equilibrium statistical estimates and vice versa; this finding is based on analytical and numerical results on our nonlinear test model and the two-layer Lorenz-96 model. Finally, even when the correct stochastic ansatz is given, it is imperative to estimate the parameters simultaneously and to account for the nonlinear feedback of the stochastic parameters into the reduced filter estimates. In numerical experiments on the two-layer Lorenz-96 model, we find that the parameters estimated online, as part of a filtering procedure, simultaneously produce accurate filtering and equilibrium statistical prediction. In contrast, an offline estimation technique based on a linear regression, which fits the parameters to a training dataset without using the filter, yields filter estimates which are worse than the observations or even divergent when the slow variables are not fully observed. This finding does not imply that all offline methods are inherently inferior to the online method for nonlinear estimation problems, it only suggests that an ideal estimation technique should estimate all parameters simultaneously whether it is online or offline. PMID:25002829
Mechanistic analysis of challenge-response experiments.
Shotwell, M S; Drake, K J; Sidorov, V Y; Wikswo, J P
2013-09-01
We present an application of mechanistic modeling and nonlinear longitudinal regression in the context of biomedical response-to-challenge experiments, a field where these methods are underutilized. In this type of experiment, a system is studied by imposing an experimental challenge, and then observing its response. The combination of mechanistic modeling and nonlinear longitudinal regression has brought new insight, and revealed an unexpected opportunity for optimal design. Specifically, the mechanistic aspect of our approach enables the optimal design of experimental challenge characteristics (e.g., intensity, duration). This article lays some groundwork for this approach. We consider a series of experiments wherein an isolated rabbit heart is challenged with intermittent anoxia. The heart responds to the challenge onset, and recovers when the challenge ends. The mean response is modeled by a system of differential equations that describe a candidate mechanism for cardiac response to anoxia challenge. The cardiac system behaves more variably when challenged than when at rest. Hence, observations arising from this experiment exhibit complex heteroscedasticity and sharp changes in central tendency. We present evidence that an asymptotic statistical inference strategy may fail to adequately account for statistical uncertainty. Two alternative methods are critiqued qualitatively (i.e., for utility in the current context), and quantitatively using an innovative Monte-Carlo method. We conclude with a discussion of the exciting opportunities in optimal design of response-to-challenge experiments. © 2013, The International Biometric Society.
Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes
Li, Degui; Li, Runze
2016-01-01
In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894
Árnadóttir, Í.; Gíslason, M. K.; Carraro, U.
2016-01-01
Muscle degeneration has been consistently identified as an independent risk factor for high mortality in both aging populations and individuals suffering from neuromuscular pathology or injury. While there is much extant literature on its quantification and correlation to comorbidities, a quantitative gold standard for analyses in this regard remains undefined. Herein, we hypothesize that rigorously quantifying entire radiodensitometric distributions elicits more muscle quality information than average values reported in extant methods. This study reports the development and utility of a nonlinear trimodal regression analysis method utilized on radiodensitometric distributions of upper leg muscles from CT scans of a healthy young adult, a healthy elderly subject, and a spinal cord injury patient. The method was then employed with a THA cohort to assess pre- and postsurgical differences in their healthy and operative legs. Results from the initial representative models elicited high degrees of correlation to HU distributions, and regression parameters highlighted physiologically evident differences between subjects. Furthermore, results from the THA cohort echoed physiological justification and indicated significant improvements in muscle quality in both legs following surgery. Altogether, these results highlight the utility of novel parameters from entire HU distributions that could provide insight into the optimal quantification of muscle degeneration. PMID:28115982
Non-Linear Concentration-Response Relationships between Ambient Ozone and Daily Mortality.
Bae, Sanghyuk; Lim, Youn-Hee; Kashima, Saori; Yorifuji, Takashi; Honda, Yasushi; Kim, Ho; Hong, Yun-Chul
2015-01-01
Ambient ozone (O3) concentration has been reported to be significantly associated with mortality. However, linearity of the relationships and the presence of a threshold has been controversial. The aim of the present study was to examine the concentration-response relationship and threshold of the association between ambient O3 concentration and non-accidental mortality in 13 Japanese and Korean cities from 2000 to 2009. We selected Japanese and Korean cities which have population of over 1 million. We constructed Poisson regression models adjusting daily mean temperature, daily mean PM10, humidity, time trend, season, year, day of the week, holidays and yearly population. The association between O3 concentration and mortality was examined using linear, spline and linear-threshold models. The thresholds were estimated for each city, by constructing linear-threshold models. We also examined the city-combined association using a generalized additive mixed model. The mean O3 concentration did not differ greatly between Korea and Japan, which were 26.2 ppb and 24.2 ppb, respectively. Seven out of 13 cities showed better fits for the spline model compared with the linear model, supporting a non-linear relationships between O3 concentration and mortality. All of the 7 cities showed J or U shaped associations suggesting the existence of thresholds. The range of city-specific thresholds was from 11 to 34 ppb. The city-combined analysis also showed a non-linear association with a threshold around 30-40 ppb. We have observed non-linear concentration-response relationship with thresholds between daily mean ambient O3 concentration and daily number of non-accidental death in Japanese and Korean cities.
Structured functional additive regression in reproducing kernel Hilbert spaces.
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2014-06-01
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
NASA Astrophysics Data System (ADS)
Ying, Yibin; Liu, Yande; Fu, Xiaping; Lu, Huishan
2005-11-01
The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. However, majority of today's applications of ANNs is back-propagate feed-forward ANN (BP-ANN). In this paper, back-propagation artificial neural networks (BP-ANN) were applied for modeling soluble solid content (SSC) of intact pear from their Fourier transform near infrared (FT-NIR) spectra. One hundred and sixty-four pear samples were used to build the calibration models and evaluate the models predictive ability. The results are compared to the classical calibration approaches, i.e. principal component regression (PCR), partial least squares (PLS) and non-linear PLS (NPLS). The effects of the optimal methods of training parameters on the prediction model were also investigated. BP-ANN combine with principle component regression (PCR) resulted always better than the classical PCR, PLS and Weight-PLS methods, from the point of view of the predictive ability. Based on the results, it can be concluded that FT-NIR spectroscopy and BP-ANN models can be properly employed for rapid and nondestructive determination of fruit internal quality.
Weisser, K; Schloos, J
1991-10-09
The relationship between serum angiotensin converting enzyme (ACE) activity and concentration of the ACE inhibitor enalaprilat was determined in vitro in the presence of different concentrations (S = 4-200 mM) of the substrate Hip-Gly-Gly. From Henderson plots, a competitive tight-binding relationship between enalaprilat and serum ACE was found yielding a value of approximately 5 nM for serum ACE concentration (Et) and an inhibition constant (Ki) for enalaprilat of approximately 0.1 nM. A plot of reaction velocity (Vi) versus total inhibitor concentration (It) exhibited a non-parallel shift of the inhibition curve to the right with increasing S. This was reflected by apparent Hill coefficients greater than 1 when the commonly used inhibitory sigmoid concentration-effect model (Emax model) was applied to the data. Slopes greater than 1 were obviously due to discrepancies between the free inhibitor concentration (If) present in the assay and It plotted on the abscissa and could, therefore, be indicators of tight-binding conditions. Thus, the sigmoid Emax model leads to an overestimation of Ki. Therefore, a modification of the inhibitory sigmoid Emax model (called "Emax tight model") was applied, which accounts for the depletion of If by binding, refers to It and allows estimation of the parameters Et and IC50f (free concentration of inhibitor when 50% inhibition occurs) using non-linear regression analysis. This model could describe the non-symmetrical shape of the inhibition curves and the results for Ki and Et correlated very well with those derived from the Henderson plots. The latter findings confirm that the degree of ACE inhibition measured in vitro is, in fact, dependent on the concentration of substrate and enzyme present in the assay. This is of importance not only for the correct evaluation of Ki but also for the interpretation of the time course of serum ACE inhibition measured ex vivo. The non-linear model has some advantages over the linear Henderson equation: it is directly applicable without conversion of the data and avoids the stochastic dependency of the variables, allowing non-linear regression of all data points contributing with the same weight.
Chuang, Shu-Chun; Rota, Matteo; Gunter, Marc J; Zeleniuch-Jacquotte, Anne; Eussen, Simone J P M; Vollset, Stein Emil; Ueland, Per Magne; Norat, Teresa; Ziegler, Regina G; Vineis, Paolo
2013-10-01
Most epidemiologic studies on folate intake suggest that folate may be protective against colorectal cancer, but the results on circulating (plasma or serum) folate are mostly inconclusive. We conducted a meta-analysis of case-control studies nested within prospective studies on circulating folate and colorectal cancer risk by using flexible meta-regression models to test the linear and nonlinear dose-response relationships. A total of 8 publications (10 cohorts, representing 3,477 cases and 7,039 controls) were included in the meta-analysis. The linear and nonlinear models corresponded to relative risks of 0.96 (95% confidence interval (CI): 0.91, 1.02) and 0.99 (95% CI: 0.96, 1.02), respectively, per 10 nmol/L of circulating folate in contrast to the reference value. The pooled relative risks when comparing the highest with the lowest category were 0.80 (95% CI: 0.61, 0.99) for radioimmunoassay and 1.03 (95% CI: 0.83, 1.22) for microbiological assay. Overall, our analyses suggest a null association between circulating folate and colorectal cancer risk. The stronger association for the radioimmunoassay-based studies could reflect differences in cohorts and study designs rather than assay performance. Further investigations need to integrate more accurate measurements and flexible modeling to explore the effects of folate in the presence of genetic, lifestyle, dietary, and hormone-related factors.
Carisse, Odile; McNealis, Vanessa
2018-01-01
Black seed disease (BSD) of strawberry is a sporadic disease caused by Mycosphaerella fragariae. Because little is known about potential crop losses or the weather conditions conducive to disease development, fungicides are generally not applied or are applied based on a preset schedule. Data collected from 2000 to 2011 representing 50 farm-years (total of 186 strawberry fields) were used to determine potential crop losses and to study the influence of weather on disease occurrence and development. First, logistic regression was used to model the relationship between occurrence of BSD and weather variables. Second, linear and nonlinear regressions were used to model the number of black seed per berry (severity) and the percentage of diseased berries (incidence). Of the 186 fields monitored, 78 showed black seed symptoms, and the number of black seed per berry ranged from 1 to 10, whereas the percentage of diseased berries ranged from 3 to 32%. The most influential weather variable was total rainfall (in millimeters) in May, with a threshold of 103 mm of rain (absence of BSD < 103 mm < presence of BSD). Similarly, nonlinear models with the total rainfall in May accurately predicted both disease severity and incidence (r = 0.94 and 0.97, respectively). Considering that management actions such as fungicide application are not needed every year in every field, these models could be used to identify fields that are at risk of BSD.
Ibáñez-Escriche, N; López de Maturana, E; Noguera, J L; Varona, L
2010-11-01
We developed and implemented change-point recursive models and compared them with a linear recursive model and a standard mixed model (SMM), in the scope of the relationship between litter size (LS) and number of stillborns (NSB) in pigs. The proposed approach allows us to estimate the point of change in multiple-segment modeling of a nonlinear relationship between phenotypes. We applied the procedure to a data set provided by a commercial Large White selection nucleus. The data file consisted of LS and NSB records of 4,462 parities. The results of the analysis clearly identified the location of the change points between different structural regression coefficients. The magnitude of these coefficients increased with LS, indicating an increasing incidence of LS on the NSB ratio. However, posterior distributions of correlations were similar across subpopulations (defined by the change points on LS), except for those between residuals. The heritability estimates of NSB did not present differences between recursive models. Nevertheless, these heritabilities were greater than those obtained for SMM (0.05) with a posterior probability of 85%. These results suggest a nonlinear relationship between LS and NSB, which supports the adequacy of a change-point recursive model for its analysis. Furthermore, the results from model comparisons support the use of recursive models. However, the adequacy of the different recursive models depended on the criteria used: the linear recursive model was preferred on account of its smallest deviance value, whereas nonlinear recursive models provided a better fit and predictive ability based on the cross-validation approach.
Estimating Biochemical Parameters of Tea (camellia Sinensis (L.)) Using Hyperspectral Techniques
NASA Astrophysics Data System (ADS)
Bian, M.; Skidmore, A. K.; Schlerf, M.; Liu, Y.; Wang, T.
2012-07-01
Tea (Camellia Sinensis (L.)) is an important economic crop and the market price of tea depends largely on its quality. This research aims to explore the potential of hyperspectral remote sensing on predicting the concentration of biochemical components, namely total tea polyphenols, as indicators of tea quality at canopy scale. Experiments were carried out for tea plants growing in the field and greenhouse. Partial least squares regression (PLSR), which has proven to be the one of the most successful empirical approach, was performed to establish the relationship between reflectance and biochemical concentration across six tea varieties in the field. Moreover, a novel integrated approach involving successive projections algorithms as band selection method and neural networks was developed and applied to detect the concentration of total tea polyphenols for one tea variety, in order to explore and model complex nonlinearity relationships between independent (wavebands) and dependent (biochemicals) variables. The good prediction accuracies (r2 > 0.8 and relative RMSEP < 10 %) achieved for tea plants using both linear (partial lease squares regress) and nonlinear (artificial neural networks) modelling approaches in this study demonstrates the feasibility of using airborne and spaceborne sensors to cover wide areas of tea plantation for in situ monitoring of tea quality cheaply and rapidly.
Claessens, T E; Georgakopoulos, D; Afanasyeva, M; Vermeersch, S J; Millar, H D; Stergiopulos, N; Westerhof, N; Verdonck, P R; Segers, P
2006-04-01
The linear time-varying elastance theory is frequently used to describe the change in ventricular stiffness during the cardiac cycle. The concept assumes that all isochrones (i.e., curves that connect pressure-volume data occurring at the same time) are linear and have a common volume intercept. Of specific interest is the steepest isochrone, the end-systolic pressure-volume relationship (ESPVR), of which the slope serves as an index for cardiac contractile function. Pressure-volume measurements, achieved with a combined pressure-conductance catheter in the left ventricle of 13 open-chest anesthetized mice, showed a marked curvilinearity of the isochrones. We therefore analyzed the shape of the isochrones by using six regression algorithms (two linear, two quadratic, and two logarithmic, each with a fixed or time-varying intercept) and discussed the consequences for the elastance concept. Our main observations were 1) the volume intercept varies considerably with time; 2) isochrones are equally well described by using quadratic or logarithmic regression; 3) linear regression with a fixed intercept shows poor correlation (R(2) < 0.75) during isovolumic relaxation and early filling; and 4) logarithmic regression is superior in estimating the fixed volume intercept of the ESPVR. In conclusion, the linear time-varying elastance fails to provide a sufficiently robust model to account for changes in pressure and volume during the cardiac cycle in the mouse ventricle. A new framework accounting for the nonlinear shape of the isochrones needs to be developed.
Gacesa, Jelena Popadic; Ivancevic, Tijana; Ivancevic, Nik; Paljic, Feodora Popic; Grujic, Nikola
2010-08-26
Our aim was to determine the dynamics in muscle strength increase and fatigue development during repetitive maximal contraction in specific maximal self-perceived elbow extensors training program. We will derive our functional model for m. triceps brachii in spirit of traditional Hill's two-component muscular model and after fitting our data, develop a prediction tool for this specific training system. Thirty-six healthy young men (21 +/- 1.0 y, BMI 25.4 +/- 7.2 kg/m(2)), who did not take part in any formal resistance exercise regime, volunteered for this study. The training protocol was performed on the isoacceleration dynamometer, lasted for 12 weeks, with a frequency of five sessions per week. Each training session included five sets of 10 maximal contractions (elbow extensions) with a 1 min resting period between each set. The non-linear dynamic system model was used for fitting our data in conjunction with the Levenberg-Marquardt regression algorithm. As a proper dynamical system, our functional model of m. triceps brachii can be used for prediction and control. The model can be used for the predictions of muscular fatigue in a single series, the cumulative daily muscular fatigue and the muscular growth throughout the training process. In conclusion, the application of non-linear dynamics in this particular training model allows us to mathematically explain some functional changes in the skeletal muscle as a result of its adaptation to programmed physical activity-training. 2010 Elsevier Ltd. All rights reserved.
Control-based continuation: Bifurcation and stability analysis for physical experiments
NASA Astrophysics Data System (ADS)
Barton, David A. W.
2017-02-01
Control-based continuation is technique for tracking the solutions and bifurcations of nonlinear experiments. The idea is to apply the method of numerical continuation to a feedback-controlled physical experiment such that the control becomes non-invasive. Since in an experiment it is not (generally) possible to set the state of the system directly, the control target becomes a proxy for the state. Control-based continuation enables the systematic investigation of the bifurcation structure of a physical system, much like if it was numerical model. However, stability information (and hence bifurcation detection and classification) is not readily available due to the presence of stabilising feedback control. This paper uses a periodic auto-regressive model with exogenous inputs (ARX) to approximate the time-varying linearisation of the experiment around a particular periodic orbit, thus providing the missing stability information. This method is demonstrated using a physical nonlinear tuned mass damper.
Model Development for MODIS Thermal Band Electronic Crosstalk
NASA Technical Reports Server (NTRS)
Chang, Tiejun; Wu, Aisheng; Geng, Xu; Li, Yonghonh; Brinkman, Jake; Keller, Graziela; Xiong, Xiaoxiong
2016-01-01
MODerate-resolution Imaging Spectroradiometer (MODIS) has 36 bands. Among them, 16 thermal emissive bands covering a wavelength range from 3.8 to 14.4 m. After 16 years on-orbit operation, the electronic crosstalk of a few Terra MODIS thermal emissive bands developed substantial issues that cause biases in the EV brightness temperature measurements and surface feature contamination. The crosstalk effects on band 27 with center wavelength at 6.7 m and band 29 at 8.5 m increased significantly in recent years, affecting downstream products such as water vapor and cloud mask. The crosstalk effect is evident in the near-monthly scheduled lunar measurements, from which the crosstalk coefficients can be derived. The development of an alternative approach is very helpful for independent verification.In this work, a physical model was developed to assess the crosstalk impact on calibration as well as in Earth view brightness temperature retrieval. This model was applied to Terra MODIS band 29 empirically to correct the Earth brightness temperature measurements. In the model development, the detectors nonlinear response is considered. The impact of the electronic crosstalk is assessed in two steps. The first step consists of determining the impact on calibration using the on-board blackbody (BB). Due to the detectors nonlinear response and large background signal, both linear and nonlinear coefficients are affected by the crosstalk from sending bands. The second step is to calculate the effects on the Earth view brightness temperature retrieval. The effects include those from affected calibration coefficients and the contamination of Earth view measurements. This model links the measurement bias with crosstalk coefficients, detector non-linearity, and the ratio of Earth measurements between the sending and receiving bands. The correction of the electronic cross talk can be implemented empirically from the processed bias at different brightness temperature. The implementation can be done through two approaches. As routine calibration assessment for thermal infrared bands, the trending over select Earth scenes is processed for all the detectors in a band and the band averaged bias is derived at a certain time. In this case, the correction of an affected band can be made using the regression of the model with band averaged bias and then corrections of detector differences are applied. The second approach requires the trending for individual detectors and the bias for each detector is used for regression with the model. A test using the first approach was made for Terra MODIS band 29 with the biases derived from long-term trending of brightness temperature over ocean and Dome-C.
García Nieto, P J; Alonso Fernández, J R; de Cos Juez, F J; Sánchez Lasheras, F; Díaz Muñiz, C
2013-04-01
Cyanotoxins, a kind of poisonous substances produced by cyanobacteria, are responsible for health risks in drinking and recreational waters. As a result, anticipate its presence is a matter of importance to prevent risks. The aim of this study is to use a hybrid approach based on support vector regression (SVR) in combination with genetic algorithms (GAs), known as a genetic algorithm support vector regression (GA-SVR) model, in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain). The GA-SVR approach is aimed at highly nonlinear biological problems with sharp peaks and the tests carried out proved its high performance. Some physical-chemical parameters have been considered along with the biological ones. The results obtained are two-fold. In the first place, the significance of each biological and physical-chemical variable on the cyanotoxins presence in the reservoir is determined with success. Finally, a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained. Copyright © 2013 Elsevier Inc. All rights reserved.
Medalie, Laura
2014-01-01
Annual and daily concentrations and fluxes of total and dissolved phosphorus, total nitrogen, chloride, and total suspended solids were estimated for 18 monitored tributaries to Lake Champlain by using the Weighted Regressions on Time, Discharge, and Seasons regression model. Estimates were made for 21 or 23 years, depending on data availability, for the purpose of providing timely and accessible summary reports as stipulated in the 2010 update to the Lake Champlain “Opportunities for Action” management plan. Estimates of concentration and flux were provided for each tributary based on (1) observed daily discharges and (2) a flow-normalizing procedure, which removed the random fluctuations of climate-related variability. The flux bias statistic, an indicator of the ability of the Weighted Regressions on Time, Discharge, and Season regression models to provide accurate representations of flux, showed acceptable bias (less than ±10 percent) for 68 out of 72 models for total and dissolved phosphorus, total nitrogen, and chloride. Six out of 18 models for total suspended solids had moderate bias (between 10 and 30 percent), an expected result given the frequently nonlinear relation between total suspended solids and discharge. One model for total suspended solids with a very high bias was influenced by a single extreme value; however, removal of that value, although reducing the bias substantially, had little effect on annual fluxes.
Perl-speaks-NONMEM (PsN)--a Perl module for NONMEM related programming.
Lindbom, Lars; Ribbing, Jakob; Jonsson, E Niclas
2004-08-01
The NONMEM program is the most widely used nonlinear regression software in population pharmacokinetic/pharmacodynamic (PK/PD) analyses. In this article we describe a programming library, Perl-speaks-NONMEM (PsN), intended for programmers that aim at using the computational capability of NONMEM in external applications. The library is object oriented and written in the programming language Perl. The classes of the library are built around NONMEM's data, model and output files. The specification of the NONMEM model is easily set or changed through the model and data file classes while the output from a model fit is accessed through the output file class. The classes have methods that help the programmer perform common repetitive tasks, e.g. summarising the output from a NONMEM run, setting the initial estimates of a model based on a previous run or truncating values over a certain threshold in the data file. PsN creates a basis for the development of high-level software using NONMEM as the regression tool.
Chen, Qihong; Long, Rong; Quan, Shuhai
2014-01-01
This paper presents a neural network predictive control strategy to optimize power distribution for a fuel cell/ultracapacitor hybrid power system of a robot. We model the nonlinear power system by employing time variant auto-regressive moving average with exogenous (ARMAX), and using recurrent neural network to represent the complicated coefficients of the ARMAX model. Because the dynamic of the system is viewed as operating- state- dependent time varying local linear behavior in this frame, a linear constrained model predictive control algorithm is developed to optimize the power splitting between the fuel cell and ultracapacitor. The proposed algorithm significantly simplifies implementation of the controller and can handle multiple constraints, such as limiting substantial fluctuation of fuel cell current. Experiment and simulation results demonstrate that the control strategy can optimally split power between the fuel cell and ultracapacitor, limit the change rate of the fuel cell current, and so as to extend the lifetime of the fuel cell. PMID:24707206
Turkdogan-Aydinol, F Ilter; Yetilmezsoy, Kaan
2010-10-15
A MIMO (multiple inputs and multiple outputs) fuzzy-logic-based model was developed to predict biogas and methane production rates in a pilot-scale 90-L mesophilic up-flow anaerobic sludge blanket (UASB) reactor treating molasses wastewater. Five input variables such as volumetric organic loading rate (OLR), volumetric total chemical oxygen demand (TCOD) removal rate (R(V)), influent alkalinity, influent pH and effluent pH were fuzzified by the use of an artificial intelligence-based approach. Trapezoidal membership functions with eight levels were conducted for the fuzzy subsets, and a Mamdani-type fuzzy inference system was used to implement a total of 134 rules in the IF-THEN format. The product (prod) and the centre of gravity (COG, centroid) methods were employed as the inference operator and defuzzification methods, respectively. Fuzzy-logic predicted results were compared with the outputs of two exponential non-linear regression models derived in this study. The UASB reactor showed a remarkable performance on the treatment of molasses wastewater, with an average TCOD removal efficiency of 93 (+/-3)% and an average volumetric TCOD removal rate of 6.87 (+/-3.93) kg TCOD(removed)/m(3)-day, respectively. Findings of this study clearly indicated that, compared to non-linear regression models, the proposed MIMO fuzzy-logic-based model produced smaller deviations and exhibited a superior predictive performance on forecasting of both biogas and methane production rates with satisfactory determination coefficients over 0.98. 2010 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Schlechtingen, Meik; Ferreira Santos, Ilmar
2011-07-01
This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal behavior model is compared to two artificial neural network based approaches, which are a full signal reconstruction and an autoregressive normal behavior model. Based on a real time series containing two generator bearing damages the capabilities of identifying the incipient fault prior to the actual failure are investigated. The period after the first bearing damage is used to develop the three normal behavior models. The developed or trained models are used to investigate how the second damage manifests in the prediction error. Furthermore the full signal reconstruction and the autoregressive approach are applied to further real time series containing gearbox bearing damages and stator temperature anomalies. The comparison revealed all three models being capable of detecting incipient faults. However, they differ in the effort required for model development and the remaining operational time after first indication of damage. The general nonlinear neural network approaches outperform the regression model. The remaining seasonality in the regression model prediction error makes it difficult to detect abnormality and leads to increased alarm levels and thus a shorter remaining operational period. For the bearing damages and the stator anomalies under investigation the full signal reconstruction neural network gave the best fault visibility and thus led to the highest confidence level.
NASA Astrophysics Data System (ADS)
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA, Forest Service) data set for the 31 eastern most United States. Second, 8-day composite of MODIS Land Cover, FPAR, LAI and GPP/NPP data were obtained from Jan 2001 to Dec 2004 (total 182 composite) and each product were filtered by pixel-level quality assurance data to select best quality pixels. Third, 30-year averaged climate data were collected from National Oceanic and Atmospheric Administration (NOAA) and five climatic variables were obtained: Monthly temperature, precipitation, annual heating and cooling days, and annual frost-free days. Forth, topographic data were obtained from digital elevation model (1km by 1km). This research will provide a better understanding of large-scale forest responses to environmental factors that will be beneficial for the development of important forest management applications.
NASA Astrophysics Data System (ADS)
Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.
2017-12-01
Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.
NASA Astrophysics Data System (ADS)
Zhang, Chenglong; Zhang, Fan; Guo, Shanshan; Liu, Xiao; Guo, Ping
2018-01-01
An inexact nonlinear mλ-measure fuzzy chance-constrained programming (INMFCCP) model is developed for irrigation water allocation under uncertainty. Techniques of inexact quadratic programming (IQP), mλ-measure, and fuzzy chance-constrained programming (FCCP) are integrated into a general optimization framework. The INMFCCP model can deal with not only nonlinearities in the objective function, but also uncertainties presented as discrete intervals in the objective function, variables and left-hand side constraints and fuzziness in the right-hand side constraints. Moreover, this model improves upon the conventional fuzzy chance-constrained programming by introducing a linear combination of possibility measure and necessity measure with varying preference parameters. To demonstrate its applicability, the model is then applied to a case study in the middle reaches of Heihe River Basin, northwest China. An interval regression analysis method is used to obtain interval crop water production functions in the whole growth period under uncertainty. Therefore, more flexible solutions can be generated for optimal irrigation water allocation. The variation of results can be examined by giving different confidence levels and preference parameters. Besides, it can reflect interrelationships among system benefits, preference parameters, confidence levels and the corresponding risk levels. Comparison between interval crop water production functions and deterministic ones based on the developed INMFCCP model indicates that the former is capable of reflecting more complexities and uncertainties in practical application. These results can provide more reliable scientific basis for supporting irrigation water management in arid areas.
QSPR using MOLGEN-QSPR: the challenge of fluoroalkane boiling points.
Rücker, Christoph; Meringer, Markus; Kerber, Adalbert
2005-01-01
By means of the new software MOLGEN-QSPR, a multilinear regression model for the boiling points of lower fluoroalkanes is established. The model is based exclusively on simple descriptors derived directly from molecular structure and nevertheless describes a broader set of data more precisely than previous attempts that used either more demanding (quantum chemical) descriptors or more demanding (nonlinear) statistical methods such as neural networks. The model's internal consistency was confirmed by leave-one-out cross-validation. The model was used to predict all unknown boiling points of fluorobutanes, and the quality of predictions was estimated by means of comparison with boiling point predictions for fluoropentanes.
Lopez, David S; Advani, Shailesh; Qiu, Xueting; Tsilidis, Konstantinos K; Khera, Mohit; Kim, Jeri; Canfield, Steven
2018-04-25
The association of caffeine intake with testosterone remains unclear. We evaluated the association of caffeine intake with serum testosterone among American men and determined whether this association varied by race/ethnicity and measurements of adiposity. Data were analyzed for 2581 men (≥20 years old) who participated in the cycles of the NHANES 1999-2004 and 2011-2012, a cross-sectional study. Testosterone (ng/mL) was measured by immunoassay among men who participated in the morning examination session. We analyzed 24-h dietary recall data to estimate caffeine intake (mg/day). Multivariable weighted linear regression models were conducted. We identified no linear relationship between caffeine intake and testosterone levels in the total population, but there was a non-linear association (p nonlinearity < .01). Similarly, stratified analysis showed nonlinear associations among Mexican-American and Non-Hispanic White men (p nonlinearity ≤ .03 both) and only among men with waist circumference <102 cm and body mass index <25 kg/m 2 (p nonlinearity < .01, both). No linear association was identified between levels of caffeine intake and testosterone in US men, but we observed a non-linear association, including among racial/ethnic groups and measurements of adiposity in this cross-sectional study. These associations are warranted to be investigated in larger prospective studies.
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan
2016-01-01
Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems in an appropriate basis. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized systems and systems that are time-varying or have external forcing. PMID:27035946
Discovering governing equations from data by sparse identification of nonlinear dynamical systems.
Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan
2016-04-12
Extracting governing equations from data is a central challenge in many diverse areas of science and engineering. Data are abundant whereas models often remain elusive, as in climate science, neuroscience, ecology, finance, and epidemiology, to name only a few examples. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems in an appropriate basis. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized systems and systems that are time-varying or have external forcing.
Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-01-01
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914
Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-03-15
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
Møller, Anne; Reventlow, Susanne; Hansen, Åse Marie; Andersen, Lars L; Siersma, Volkert; Lund, Rikke; Avlund, Kirsten; Andersen, Johan Hviid; Mortensen, Ole Steen
2015-11-04
Our aim was to study associations between physical exposures throughout working life and physical function measured as chair-rise performance in midlife. The Copenhagen Aging and Midlife Biobank (CAMB) provided data about employment and measures of physical function. Individual job histories were assigned exposures from a job exposure matrix. Exposures were standardised to ton-years (lifting 1000 kg each day in 1 year), stand-years (standing/walking for 6 h each day in 1 year) and kneel-years (kneeling for 1 h each day in 1 year). The associations between exposure-years and chair-rise performance (number of chair-rises in 30 s) were analysed in multivariate linear and non-linear regression models adjusted for covariates. Mean age among the 5095 participants was 59 years in both genders, and, on average, men achieved 21.58 (SD=5.60) and women 20.38 (SD=5.33) chair-rises in 30 s. Physical exposures were associated with poorer chair-rise performance in both men and women, however, only associations between lifting and standing/walking and chair-rise remained statistically significant among men in the final model. Spline regression analyses showed non-linear associations and confirmed the findings. Higher physical exposure throughout working life is associated with slightly poorer chair-rise performance. The associations between exposure and outcome were non-linear. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
celerite: Scalable 1D Gaussian Processes in C++, Python, and Julia
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel; Agol, Eric; Ambikasaran, Sivaram; Angus, Ruth
2017-09-01
celerite provides fast and scalable Gaussian Process (GP) Regression in one dimension and is implemented in C++, Python, and Julia. The celerite API is designed to be familiar to users of george and, like george, celerite is designed to efficiently evaluate the marginalized likelihood of a dataset under a GP model. This is then be used alongside a non-linear optimization or posterior inference library for the best results.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Structured functional additive regression in reproducing kernel Hilbert spaces
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2013-01-01
Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362
Temperature in lowland Danish streams: contemporary patterns, empirical models and future scenarios
NASA Astrophysics Data System (ADS)
Lagergaard Pedersen, Niels; Sand-Jensen, Kaj
2007-01-01
Continuous temperature measurements at 11 stream sites in small lowland streams of North Zealand, Denmark over a year showed much higher summer temperatures and lower winter temperatures along the course of the stream with artificial lakes than in the stream without lakes. The influence of lakes was even more prominent in the comparisons of colder lake inlets and warmer outlets and led to the decline of cold-water and oxygen-demanding brown trout. Seasonal and daily temperature variations were, as anticipated, dampened by forest cover, groundwater input, input from sewage plants and high downstream discharges. Seasonal variations in daily water temperature could be predicted with high accuracy at all sites by a linear air-water regression model (r2: 0.903-0.947). The predictions improved in all instances (r2: 0.927-0.964) by a non-linear logistic regression according to which water temperatures do not fall below freezing and they increase less steeply than air temperatures at high temperatures because of enhanced heat loss from the stream by evaporation and back radiation. The predictions improved slightly (r2: 0.933-0.969) by a multiple regression model which, in addition to air temperature as the main predictor, included solar radiation at un-shaded sites, relative humidity, precipitation and discharge. Application of the non-linear logistic model for a warming scenario of 4-5 °C higher air temperatures in Denmark in 2070-2100 yielded predictions of temperatures rising 1.6-3.0 °C during winter and summer and 4.4-6.0 °C during spring in un-shaded streams with low groundwater input. Groundwater-fed springs are expected to follow the increase of mean air temperatures for the region. Great caution should be exercised in these temperature projections because global and regional climate scenarios remain open to discussion. Copyright
Hoyer, Dirk; Leder, Uwe; Hoyer, Heike; Pompe, Bernd; Sommer, Michael; Zwiener, Ulrich
2002-01-01
The heart rate variability (HRV) is related to several mechanisms of the complex autonomic functioning such as respiratory heart rate modulation and phase dependencies between heart beat cycles and breathing cycles. The underlying processes are basically nonlinear. In order to understand and quantitatively assess those physiological interactions an adequate coupling analysis is necessary. We hypothesized that nonlinear measures of HRV and cardiorespiratory interdependencies are superior to the standard HRV measures in classifying patients after acute myocardial infarction. We introduced mutual information measures which provide access to nonlinear interdependencies as counterpart to the classically linear correlation analysis. The nonlinear statistical autodependencies of HRV were quantified by auto mutual information, the respiratory heart rate modulation by cardiorespiratory cross mutual information, respectively. The phase interdependencies between heart beat cycles and breathing cycles were assessed basing on the histograms of the frequency ratios of the instantaneous heart beat and respiratory cycles. Furthermore, the relative duration of phase synchronized intervals was acquired. We investigated 39 patients after acute myocardial infarction versus 24 controls. The discrimination of these groups was improved by cardiorespiratory cross mutual information measures and phase interdependencies measures in comparison to the linear standard HRV measures. This result was statistically confirmed by means of logistic regression models of particular variable subsets and their receiver operating characteristics.
NASA Astrophysics Data System (ADS)
Wang, Li-yong; Li, Le; Zhang, Zhi-hua
2016-09-01
Hot compression tests of Ti-6Al-4V alloy in a wide temperature range of 1023-1323 K and strain rate range of 0.01-10 s-1 were conducted by a servo-hydraulic and computer-controlled Gleeble-3500 machine. In order to accurately and effectively characterize the highly nonlinear flow behaviors, support vector regression (SVR) which is a machine learning method was combined with genetic algorithm (GA) for characterizing the flow behaviors, namely, the GA-SVR. The prominent character of GA-SVR is that it with identical training parameters will keep training accuracy and prediction accuracy at a stable level in different attempts for a certain dataset. The learning abilities, generalization abilities, and modeling efficiencies of the mathematical regression model, ANN, and GA-SVR for Ti-6Al-4V alloy were detailedly compared. Comparison results show that the learning ability of the GA-SVR is stronger than the mathematical regression model. The generalization abilities and modeling efficiencies of these models were shown as follows in ascending order: the mathematical regression model < ANN < GA-SVR. The stress-strain data outside experimental conditions were predicted by the well-trained GA-SVR, which improved simulation accuracy of the load-stroke curve and can further improve the related research fields where stress-strain data play important roles, such as speculating work hardening and dynamic recovery, characterizing dynamic recrystallization evolution, and improving processing maps.
NASA Astrophysics Data System (ADS)
Gould, Jessica; Kienast, Markus; Dowd, Michael
2017-05-01
Alkenone unsaturation, expressed as the UK37' index, is closely related to growth temperature of prymnesiophytes, thus providing a reliable proxy to infer past sea surface temperatures (SSTs). Here we address two lingering uncertainties related to this SST proxy. First, calibration models developed for core-top sediments and those developed for surface suspended particulates organic material (SPOM) show systematic offsets, raising concerns regarding the transfer of the primary signal into the sedimentary record. Second, questions remain regarding changes in slope of the UK37' vs. growth temperature relationship at the temperature extremes. Based on (re)analysis of 31 new and 394 previously published SPOM UK37' data from the Atlantic Ocean, a new regression model to relate UK37' to SST is introduced; the Richards curve (Richards, 1959). This non-linear regression model provides a robust calibration of the UK37' vs. SST relationship for Atlantic SPOM samples and uniquely accounts for both the fact that the UK37' index is a proportion, and so must lie between 0 and 1, as well as for the observed reduction in slope at the warm and cold ends of the temperature range. As with prior fits of SPOM UK37' vs. SST, the Richards model is offset from traditional regression models of sedimentary UK37' vs. SST. We posit that (some of) this offset can be attributed to the seasonally and depth biased sampling of SPOM material.
An algebraic method for constructing stable and consistent autoregressive filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu
2015-02-15
In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less
NASA Astrophysics Data System (ADS)
Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao
2018-02-01
An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.
Corron, Louise; Marchal, François; Condemi, Silvana; Telmon, Norbert; Chaumoitre, Kathia; Adalian, Pascal
2018-05-31
Subadult age estimation should rely on sampling and statistical protocols capturing development variability for more accurate age estimates. In this perspective, measurements were taken on the fifth lumbar vertebrae and/or clavicles of 534 French males and females aged 0-19 years and the ilia of 244 males and females aged 0-12 years. These variables were fitted in nonparametric multivariate adaptive regression splines (MARS) models with 95% prediction intervals (PIs) of age. The models were tested on two independent samples from Marseille and the Luis Lopes reference collection from Lisbon. Models using ilium width and module, maximum clavicle length, and lateral vertebral body heights were more than 92% accurate. Precision was lower for postpubertal individuals. Integrating punctual nonlinearities of the relationship between age and the variables and dynamic prediction intervals incorporated the normal increase in interindividual growth variability (heteroscedasticity of variance) with age for more biologically accurate predictions. © 2018 American Academy of Forensic Sciences.
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate. Copyright © 2015 Elsevier Ltd. All rights reserved.
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
Regression Discontinuity Designs in Epidemiology
Moscoe, Ellen; Mutevedzi, Portia; Newell, Marie-Louise; Bärnighausen, Till
2014-01-01
When patients receive an intervention based on whether they score below or above some threshold value on a continuously measured random variable, the intervention will be randomly assigned for patients close to the threshold. The regression discontinuity design exploits this fact to estimate causal treatment effects. In spite of its recent proliferation in economics, the regression discontinuity design has not been widely adopted in epidemiology. We describe regression discontinuity, its implementation, and the assumptions required for causal inference. We show that regression discontinuity is generalizable to the survival and nonlinear models that are mainstays of epidemiologic analysis. We then present an application of regression discontinuity to the much-debated epidemiologic question of when to start HIV patients on antiretroviral therapy. Using data from a large South African cohort (2007–2011), we estimate the causal effect of early versus deferred treatment eligibility on mortality. Patients whose first CD4 count was just below the 200 cells/μL CD4 count threshold had a 35% lower hazard of death (hazard ratio = 0.65 [95% confidence interval = 0.45–0.94]) than patients presenting with CD4 counts just above the threshold. We close by discussing the strengths and limitations of regression discontinuity designs for epidemiology. PMID:25061922
Fong, Youyi; Yu, Xuesong
2016-01-01
Many modern serial dilution assays are based on fluorescence intensity (FI) readouts. We study optimal transformation model choice for fitting five parameter logistic curves (5PL) to FI-based serial dilution assay data. We first develop a generalized least squares-pseudolikelihood type algorithm for fitting heteroscedastic logistic models. Next we show that the 5PL and log 5PL functions can approximate each other well. We then compare four 5PL models with different choices of log transformation and variance modeling through a Monte Carlo study and real data. Our findings are that the optimal choice depends on the intended use of the fitted curves. PMID:27642502
Assessing the potential for improving S2S forecast skill through multimodel ensembling
NASA Astrophysics Data System (ADS)
Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.
2016-12-01
Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
Modeling the relationships between quality and biochemical composition of fatty liver in mule ducks.
Theron, L; Cullere, M; Bouillier-Oudot, M; Manse, H; Dalle Zotte, A; Molette, C; Fernandez, X; Vitezica, Z G
2012-09-01
The fatty liver of mule ducks (i.e., French "foie gras") is the most valuable product in duck production systems. Its quality is measured by the technological yield, which is the opposite of the fat loss during cooking. The purpose of this study was to determine whether biochemical measures of fatty liver could be used to accurately predict the technological yield (TY). Ninety-one male mule ducks were bred, overfed, and slaughtered under commercial conditions. Fatty liver weight (FLW) and biochemical variables, such as DM, lipid (LIP), and protein content (PROT), were collected. To evaluate evidence for nonlinear fat loss during cooking, we compared regression models describing linear and nonlinear relations between biochemical measures and TY. We detected significantly greater (P = 0.02) linear relation between DM and TY. Our results indicate that LIP and PROT follow a different pattern (linear) than DM and showed that LIP and PROT are nonexclusive contributing factors to TY. Other components, such as carbohydrates, other than those measured in this study, could contribute to DM. Stepwise regression for TY was performed. The traditional model with FLW was tested. The results showed that the weight of the liver is of limited value in the determination of fat loss during cooking (R(2) = 0.14). The most accurate TY prediction equation included DM (in linear and quadratic terms), FLW, and PROT (R(2) = 0.43). Biochemical measures in the fatty liver were more accurate predictors of TY than FLW. The model is useful in commercial conditions because DM, PROT, and FLW are noninvasive measures.
Principal dynamic mode analysis of neural mass model for the identification of epileptic states
NASA Astrophysics Data System (ADS)
Cao, Yuzhen; Jin, Liu; Su, Fei; Wang, Jiang; Deng, Bin
2016-11-01
The detection of epileptic seizures in Electroencephalography (EEG) signals is significant for the diagnosis and treatment of epilepsy. In this paper, in order to obtain characteristics of various epileptiform EEGs that may differentiate different states of epilepsy, the concept of Principal Dynamic Modes (PDMs) was incorporated to an autoregressive model framework. First, the neural mass model was used to simulate the required intracerebral EEG signals of various epileptiform activities. Then, the PDMs estimated from the nonlinear autoregressive Volterra models, as well as the corresponding Associated Nonlinear Functions (ANFs), were used for the modeling of epileptic EEGs. The efficient PDM modeling approach provided physiological interpretation of the system. Results revealed that the ANFs of the 1st and 2nd PDMs for the auto-regressive input exhibited evident differences among different states of epilepsy, where the ANFs of the sustained spikes' activity encountered at seizure onset or during a seizure were the most differentiable from that of the normal state. Therefore, the ANFs may be characteristics for the classification of normal and seizure states in the clinical detection of seizures and thus provide assistance for the diagnosis of epilepsy.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Development of a funding, cost, and spending model for satellite projects
NASA Technical Reports Server (NTRS)
Johnson, Jesse P.
1989-01-01
The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.
Kasten, Florian H; Negahbani, Ehsan; Fröhlich, Flavio; Herrmann, Christoph S
2018-05-31
Amplitude modulated transcranial alternating current stimulation (AM-tACS) has been recently proposed as a possible solution to overcome the pronounced stimulation artifact encountered when recording brain activity during tACS. In theory, AM-tACS does not entail power at its modulating frequency, thus avoiding the problem of spectral overlap between brain signal of interest and stimulation artifact. However, the current study demonstrates how weak non-linear transfer characteristics inherent to stimulation and recording hardware can reintroduce spurious artifacts at the modulation frequency. The input-output transfer functions (TFs) of different stimulation setups were measured. Setups included recordings of signal-generator and stimulator outputs and M/EEG phantom measurements. 6 th -degree polynomial regression models were fitted to model the input-output TFs of each setup. The resulting TF models were applied to digitally generated AM-tACS signals to predict the frequency of spurious artifacts in the spectrum. All four setups measured for the study exhibited low-frequency artifacts at the modulation frequency and its harmonics when recording AM-tACS. Fitted TF models showed non-linear contributions significantly different from zero (all p < .05) and successfully predicted the frequency of artifacts observed in AM-signal recordings. Results suggest that even weak non-linearities of stimulation and recording hardware can lead to spurious artifacts at the modulation frequency and its harmonics. These artifacts were substantially larger than alpha-oscillations of a human subject in the MEG. Findings emphasize the need for more linear stimulation devices for AM-tACS and careful analysis procedures, taking into account low-frequency artifacts to avoid confusion with effects of AM-tACS on the brain. Copyright © 2018 Elsevier Inc. All rights reserved.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Zheng, Xueying; Qin, Guoyou; Tu, Dongsheng
2017-05-30
Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data, which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial that motivated this research. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated proportional responses. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Neural network modeling for surgical decisions on traumatic brain injury patients.
Li, Y C; Liu, L; Chiu, W T; Jian, W S
2000-01-01
Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
Gamst-Klaussen, Thor; Chen, Gang; Lamu, Admassu N; Olsen, Jan Abel
2016-07-01
Different health state utility (HSU) instruments produce different utilities for the same individuals, thereby compromising the intended comparability of economic evaluations of health care interventions. When developing crosswalks, previous studies have indicated nonlinear relationships. This paper inquires into the degree of nonlinearity across the four most widely used HSU-instruments and proposes exchange rates that differ depending on the severity levels of the health state utility scale. Overall, 7933 respondents from six countries, 1760 in a non-diagnosed healthy group and 6173 in seven disease groups, reported their health states using four different instruments: EQ-5D-5L, SF-6D, HUI-3 and 15D. Quantile regressions investigate the degree of nonlinear relationships between these instruments. To compare the instruments across different disease severities, we split the health state utility scale into utility intervals with 0.2 successive decrements in utility starting from perfect health at 1.00. Exchange rates (ERs) are calculated as the mean utility difference between two utility intervals on one HSU-instrument divided by the difference in mean utility on another HSU-instrument. Quantile regressions reveal significant nonlinear relationships across all four HSU-instruments. The degrees of nonlinearities differ, with a maximum degree of difference in the coefficients along the health state utility scale of 3.34 when SF-6D is regressed on EQ-5D. At the lower end of the health state utility scale, the exchange rate from SF-6D to EQ-5D is 2.11, whilst at the upper end it is 0.38. Comparisons at different utility levels illustrate the fallacy of using linear functions as crosswalks between HSU-instruments. The existence of nonlinear relationships between different HSU-instruments suggests that level-specific exchange rates should be used when converting a change in utility on the instrument used, onto a corresponding utility change had another instrument been used. Accounting for nonlinearities will increase the validity of the comparison for decision makers when faced with a choice between interventions whose calculations of QALY gains have been based on different HSU-instruments.
Friesz, Paul J.
2010-01-01
Areas contributing recharge to four well fields in two study sites in southern Rhode Island were delineated on the basis of steady-state groundwater-flow models representing average hydrologic conditions. The wells are screened in sand and gravel deposits in wetland and coastal settings. The groundwater-flow models were calibrated by inverse modeling using nonlinear regression. Summary statistics from nonlinear regression were used to evaluate the uncertainty associated with the predicted areas contributing recharge to the well fields. In South Kingstown, two United Water Rhode Island well fields are in Mink Brook watershed and near Worden Pond and extensive wetlands. Wetland deposits of peat near the well fields generally range in thickness from 5 to 8 feet. Analysis of water-level drawdowns in a piezometer screened beneath the peat during a 20-day pumping period indicated vertical leakage and a vertical hydraulic conductivity for the peat of roughly 0.01 ft/d. The simulated area contributing recharge for average withdrawals of 2,138 gallons per minute during 2003-07 extended to groundwater divides in mostly till and morainal deposits, and it encompassed 2.30 square miles. Most of a sand and gravel mining operation between the well fields was in the simulated contributing area. For the maximum pumping capacity (5,100 gallons per minute), the simulated area contributing recharge expanded to 5.54 square miles. The well fields intercepted most of the precipitation recharge in Mink Brook watershed and in an adjacent small watershed, and simulated streams ceased to flow. The simulated contributing area to the well fields included an area beneath Worden Pond and a remote, isolated area in upland till on the opposite side of Worden Pond from the well fields. About 12 percent of the pumped water was derived from Worden Pond. In Charlestown, the Central Beach Fire District and the East Beach Water Association well fields are on a small (0.85 square mile) peninsula in a coastal setting. The wells are screened in a coarse-grained, ice-proximal part of a morphosequence with saturated thicknesses generally less than 30 feet on the peninsula. The simulated area contributing recharge for the average withdrawal (16 gallons per minute) during 2003-07 was 0.018 square mile. The contributing area extended southwestward from the well fields to a simulated groundwater mound; it underlay part of a small nearby wetland, and it included isolated areas on the side of the wetland opposite the well fields. For the maximum pumping rate (230 gallons per minute), the simulated area contributing recharge (0.26 square mile) expanded in all directions; it included a till area on the peninsula, and it underlay part of a nearby pond. Because the well fields are screened in a thin aquifer, simulated groundwater traveltimes from recharge locations to the discharging wells were short: 94 percent of the traveltimes were 10 years or less, and the median traveltime was 1.3 years. Model-prediction uncertainty was evaluated using a Monte Carlo analysis; the parameter variance-covariance matrix from nonlinear regression was used to create parameter sets for the analysis. Important parameters for model prediction that could not be estimated by nonlinear regression were incorporated into the variance-covariance matrix. For the South Kingstown study site, observations provided enough information to constrain the uncertainty of these parameters within realistic ranges, but for the Charlestown study site, prior information on parameters was required. Thus, the uncertainty analysis for the South Kingstown study site was an outcome of calibrating the model to available observations, but the Charlestown study site was also dependent on information provided by the modeler. A water budget and model-fit statistical criteria were used to assess parameter sets so that prediction uncertainty was not overestimated. For the scenarios using maximum pumping rates at both study
ASR4. Anelastic Strain Recovery Analysis Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warpinski, N.R.
ASR4 is a nonlinear least-squares regression of Anelastic Strain Recovery (ASR) data for the purpose of determining in situ stress orientations and magnitudes. ASR4 fits the viscoelastic model of Warpinski and Teufel to measure ASR data, calculates the stress orientations directly, and stress magnitudes if sufficient input data are available. The code also calculates the stress orientation using strain-rosette equations, and it calculates stress magnitudes using Blanton`s approach, assuming sufficient input data are available.
Anelastic Strain Recovery Analysis Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
ASR4 is a nonlinear least-squares regression of Anelastic Strain Recovery (ASR) data for the purpose of determining in situ stress orientations and magnitudes. ASR4 fits the viscoelastic model of Warpinski and Teufel to measure ASR data, calculates the stress orientations directly, and stress magnitudes if sufficient input data are available. The code also calculates the stress orientation using strain-rosette equations, and it calculates stress magnitudes using Blanton''s approach, assuming sufficient input data are available.
Brunton, Steven L; Brunton, Bingni W; Proctor, Joshua L; Kutz, J Nathan
2016-01-01
In this wIn this work, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.ork, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.
2009-03-01
80 100 120 140 -0 .6 -0 .4 -0 .2 0. 0 0. 2 0. 4 0. 6 l1 fit M irr or T ra ce r M od el Figure 26. l1fit of Mirror Tracer Model To ensure model... teachers are unfair to students is nonsense. b. Most students don’t realize the extent to which their grades are influenced by accidental happenings...understand how teachers arrive at the grades they give. b. There is a direct connection between how hard 1 study and the grades I get. 24. a. A
Estimating procedure times for surgeries by determining location parameters for the lognormal model.
Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H
2004-05-01
We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
Research on On-Line Modeling of Fed-Batch Fermentation Process Based on v-SVR
NASA Astrophysics Data System (ADS)
Ma, Yongjun
The fermentation process is very complex and non-linear, many parameters are not easy to measure directly on line, soft sensor modeling is a good solution. This paper introduces v-support vector regression (v-SVR) for soft sensor modeling of fed-batch fermentation process. v-SVR is a novel type of learning machine. It can control the accuracy of fitness and prediction error by adjusting the parameter v. An on-line training algorithm is discussed in detail to reduce the training complexity of v-SVR. The experimental results show that v-SVR has low error rate and better generalization with appropriate v.
NASA Technical Reports Server (NTRS)
Kattan, Michael W.; Hess, Kenneth R.; Kattan, Michael W.
1998-01-01
New computationally intensive tools for medical survival analyses include recursive partitioning (also called CART) and artificial neural networks. A challenge that remains is to better understand the behavior of these techniques in effort to know when they will be effective tools. Theoretically they may overcome limitations of the traditional multivariable survival technique, the Cox proportional hazards regression model. Experiments were designed to test whether the new tools would, in practice, overcome these limitations. Two datasets in which theory suggests CART and the neural network should outperform the Cox model were selected. The first was a published leukemia dataset manipulated to have a strong interaction that CART should detect. The second was a published cirrhosis dataset with pronounced nonlinear effects that a neural network should fit. Repeated sampling of 50 training and testing subsets was applied to each technique. The concordance index C was calculated as a measure of predictive accuracy by each technique on the testing dataset. In the interaction dataset, CART outperformed Cox (P less than 0.05) with a C improvement of 0.1 (95% Cl, 0.08 to 0.12). In the nonlinear dataset, the neural network outperformed the Cox model (P less than 0.05), but by a very slight amount (0.015). As predicted by theory, CART and the neural network were able to overcome limitations of the Cox model. Experiments like these are important to increase our understanding of when one of these new techniques will outperform the standard Cox model. Further research is necessary to predict which technique will do best a priori and to assess the magnitude of superiority.
Smooth individual level covariates adjustment in disease mapping.
Huque, Md Hamidul; Anderson, Craig; Walton, Richard; Woolford, Samuel; Ryan, Louise
2018-05-01
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available "indiCAR" model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log-linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non-log-linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth-indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two-step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth-indiCAR through simulation. Our results indicate that the smooth-indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Data-driven discovery of partial differential equations.
Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan
2017-04-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.
NASA Astrophysics Data System (ADS)
Xin, Pei; Wang, Shen S. J.; Shen, Chengji; Zhang, Zeyu; Lu, Chunhui; Li, Ling
2018-03-01
Shallow groundwater interacts strongly with surface water across a quarter of global land area, affecting significantly the terrestrial eco-hydrology and biogeochemistry. We examined groundwater behavior subjected to unimodal impulse and irregular surface water fluctuations, combining physical experiments, numerical simulations, and functional data analysis. Both the experiments and numerical simulations demonstrated a damped and delayed response of groundwater table to surface water fluctuations. To quantify this hysteretic shallow groundwater behavior, we developed a regression model with the Gamma distribution functions adopted to account for the dependence of groundwater behavior on antecedent surface water conditions. The regression model fits and predicts well the groundwater table oscillations resulting from propagation of irregular surface water fluctuations in both laboratory and large-scale aquifers. The coefficients of the Gamma distribution function vary spatially, reflecting the hysteresis effect associated with increased amplitude damping and delay as the fluctuation propagates. The regression model, in a relatively simple functional form, has demonstrated its capacity of reproducing high-order nonlinear effects that underpin the surface water and groundwater interactions. The finding has important implications for understanding and predicting shallow groundwater behavior and associated biogeochemical processes, and will contribute broadly to studies of groundwater-dependent ecology and biogeochemistry.
Deriving the Regression Equation without Using Calculus
ERIC Educational Resources Information Center
Gordon, Sheldon P.; Gordon, Florence S.
2004-01-01
Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…
Factors affecting yearly and monthly visits to Taipei Zoo
NASA Astrophysics Data System (ADS)
Su, Ai-Tsen; Lin, Yann-Jou
2018-02-01
This study investigated factors affecting yearly and monthly numbers of visits to Taipei Zoo. Both linear and nonlinear regression models were used to estimate yearly visits. The results of both models showed that the "opening effect" and "animal star effect" had a significantly positive effect on yearly visits, while a SARS outbreak had a negative effect. The number of years had a significant influence on yearly visits. Results showed that the nonlinear model had better explanatory power and fitted the variations of visits better. Results of monthly model showed that monthly visits were significantly influenced by time fluctuations, weather conditions, and the animal star effect. Chinese New Year, summer vacation, numbers of holidays, and animal star exhibitions increased the number of monthly visits, while the number of days with temperatures at or below 15 °C, the number of days with temperatures at or above 30 °C, and the number of rainy days had significantly negative effects. Furthermore, the model of monthly visits showed that the animal star effect could last for over two quarters. The results of this study clarify the factors affecting visits to an outdoor recreation site and confirm the importance of meteorological factors to recreation use.
Haem, Elham; Harling, Kajsa; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Karlsson, Mats O
2017-02-01
One important aim in population pharmacokinetics (PK) and pharmacodynamics is identification and quantification of the relationships between the parameters and covariates. Lasso has been suggested as a technique for simultaneous estimation and covariate selection. In linear regression, it has been shown that Lasso possesses no oracle properties, which means it asymptotically performs as though the true underlying model was given in advance. Adaptive Lasso (ALasso) with appropriate initial weights is claimed to possess oracle properties; however, it can lead to poor predictive performance when there is multicollinearity between covariates. This simulation study implemented a new version of ALasso, called adjusted ALasso (AALasso), to take into account the ratio of the standard error of the maximum likelihood (ML) estimator to the ML coefficient as the initial weight in ALasso to deal with multicollinearity in non-linear mixed-effect models. The performance of AALasso was compared with that of ALasso and Lasso. PK data was simulated in four set-ups from a one-compartment bolus input model. Covariates were created by sampling from a multivariate standard normal distribution with no, low (0.2), moderate (0.5) or high (0.7) correlation. The true covariates influenced only clearance at different magnitudes. AALasso, ALasso and Lasso were compared in terms of mean absolute prediction error and error of the estimated covariate coefficient. The results show that AALasso performed better in small data sets, even in those in which a high correlation existed between covariates. This makes AALasso a promising method for covariate selection in nonlinear mixed-effect models.
NASA Astrophysics Data System (ADS)
Hajigeorgiou, Photos G.
2016-12-01
An analytical model for the diatomic potential energy function that was recently tested as a universal function (Hajigeorgiou, 2010) has been further modified and tested as a suitable model for direct-potential-fit analysis. Applications are presented for the ground electronic states of three diatomic molecules: oxygen, carbon monoxide, and hydrogen fluoride. The adjustable parameters of the extended Lennard-Jones potential model are determined through nonlinear regression by fits to calculated rovibrational energy term values or experimental spectroscopic line positions. The model is shown to lead to reliable, compact and simple representations for the potential energy functions of these systems and could therefore be classified as a suitable and attractive model for direct-potential-fit analysis.
Model development for MODIS thermal band electronic cross-talk
NASA Astrophysics Data System (ADS)
Chang, Tiejun; Wu, Aisheng; Geng, Xu; Li, Yonghong; Brinkmann, Jake; Keller, Graziela; Xiong, Xiaoxiong (Jack)
2016-10-01
MODerate-resolution Imaging Spectroradiometer (MODIS) has 36 bands. Among them, 16 thermal emissive bands covering a wavelength range from 3.8 to 14.4 μm. After 16 years on-orbit operation, the electronic crosstalk of a few Terra MODIS thermal emissive bands develop substantial issues which cause biases in the EV brightness temperature measurements and surface feature contamination. The crosstalk effects on band 27 with center wavelength at 6.7 μm and band 29 at 8.5 μm increased significantly in recent years, affecting downstream products such as water vapor and cloud mask. The crosstalk issue can be observed from nearly monthly scheduled lunar measurements, from which the crosstalk coefficients can be derived. Most of MODIS thermal bands are saturated at moon surface temperatures and the development of an alternative approach is very helpful for verification. In this work, a physical model was developed to assess the crosstalk impact on calibration as well as in Earth view brightness temperature retrieval. This model was applied to Terra MODIS band 29 empirically for correction of Earth brightness temperature measurements. In the model development, the detector nonlinear response is considered. The impacts of the electronic crosstalk are assessed in two steps. The first step consists of determining the impact on calibration using the on-board blackbody (BB). Due to the detector nonlinear response and large background signal, both linear and nonlinear coefficients are affected by the crosstalk from sending bands. The crosstalk impact on calibration coefficients was calculated. The second step is to calculate the effects on the Earth view brightness temperature retrieval. The effects include those from affected calibration coefficients and the contamination of Earth view measurements. This model links the measurement bias with crosstalk coefficients, detector nonlinearity, and the ratio of Earth measurements between the sending and receiving bands. The correction of the electronic crosstalk can be implemented empirically from the processed bias at different brightness temperature. The implementation can be done through two approaches. As routine calibration assessment for thermal infrared bands, the trending over select Earth scenes is processed for all the detectors in a band and the band averaged bias is derived for certain time. In this case, the correction of an affected band can be made using the regression of the model with band averaged bias and then corrections of detector differences are applied. The second approach requires the trending for individual detectors and the bias for each detector is used for regression with the model. A test using the first approach was made for Terra MODIS band 29 with the biases derived from long-term trending of sea surface temperature and Dome-C surface temperature.
Teutsch, T; Mesch, M; Giessen, H; Tarin, C
2015-01-01
In this contribution, a method to select discrete wavelengths that allow an accurate estimation of the glucose concentration in a biosensing system based on metamaterials is presented. The sensing concept is adapted to the particular application of ophthalmic glucose sensing by covering the metamaterial with a glucose-sensitive hydrogel and the sensor readout is performed optically. Due to the fact that in a mobile context a spectrometer is not suitable, few discrete wavelengths must be selected to estimate the glucose concentration. The developed selection methods are based on nonlinear support vector regression (SVR) models. Two selection methods are compared and it is shown that wavelengths selected by a sequential forward feature selection algorithm achieves an estimation improvement. The presented method can be easily applied to different metamaterial layouts and hydrogel configurations.
NASA Astrophysics Data System (ADS)
Aye, S. A.; Heyns, P. S.
2017-02-01
This paper proposes an optimal Gaussian process regression (GPR) for the prediction of remaining useful life (RUL) of slow speed bearings based on a novel degradation assessment index obtained from acoustic emission signal. The optimal GPR is obtained from an integration or combination of existing simple mean and covariance functions in order to capture the observed trend of the bearing degradation as well the irregularities in the data. The resulting integrated GPR model provides an excellent fit to the data and improves over the simple GPR models that are based on simple mean and covariance functions. In addition, it achieves a low percentage error prediction of the remaining useful life of slow speed bearings. These findings are robust under varying operating conditions such as loading and speed and can be applied to nonlinear and nonstationary machine response signals useful for effective preventive machine maintenance purposes.
NASA Astrophysics Data System (ADS)
Dorband, J. E.; Tilak, N.; Radov, A.
2016-12-01
In this paper, a classical computer implementation of RBM is compared to a quantum annealing based RBM running on a D-Wave 2X (an adiabatic quantum computer). The codes for both are essentially identical. Only a flag is set to change the activation function from a classically computed logistic function to the D-Wave. To obtain greater understanding of the behavior of the D-Wave, a study of the stochastic properties of a virtual qubit (a 12 qubit chain) and a cell of qubits (an 8 qubit cell) was performed. We will present the results of comparing the D-Wave implementation with a theoretically errorless adiabatic quantum computer. The main purpose of this study is to develop a generic RBM regression tool in order to infer CO2 fluxes from the NASA satellite OCO-2 observed CO2 concentrations and predicted atmospheric states using regression models. The carbon fluxes will then be assimilated into a land surface model to predict the Net Ecosystem Exchange at globally distributed regional sites.
EPA Office of Water (OW): 2002 SPARROW Total NP (Catchments)
SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling tool with output that allows the user to interpret water quality monitoring data at the regional and sub-regional scale. The model relates in-stream water-quality measurements to spatially referenced characteristics of watersheds, including pollutant sources and environmental factors that affect rates of pollutant delivery to streams from the land and aquatic, in-stream processing . The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and non-point (or ??diffuse??) sources on land to rivers and through the stream and river network. SPARROW estimates contaminant concentrations, loads (or ??mass,?? which is the product of concentration and streamflow), and yields in streams (mass of nitrogen and of phosphorus entering a stream per acre of land). It empirically estimates the origin and fate of contaminants in streams and receiving bodies, and quantifies uncertainties in model predictions. The model predictions are illustrated through detailed maps that provide information about contaminant loadings and source contributions at multiple scales for specific stream reaches, basins, or other geographic areas.
Kamarianakis, Yiannis; Gao, H Oliver
2010-02-15
Collecting and analyzing high frequency emission measurements has become very usual during the past decade as significantly more information with respect to formation conditions can be collected than from regulated bag measurements. A challenging issue for researchers is the accurate time-alignment between tailpipe measurements and engine operating variables. An alignment procedure should take into account both the reaction time of the analyzers and the dynamics of gas transport in the exhaust and measurement systems. This paper discusses a statistical modeling framework that compensates for variable exhaust transport delay while relating tailpipe measurements with engine operating covariates. Specifically it is shown that some variants of the smooth transition regression model allow for transport delays that vary smoothly as functions of the exhaust flow rate. These functions are characterized by a pair of coefficients that can be estimated via a least-squares procedure. The proposed models can be adapted to encompass inherent nonlinearities that were implicit in previous instantaneous emissions modeling efforts. This article describes the methodology and presents an illustrative application which uses data collected from a diesel bus under real-world driving conditions.
NASA Astrophysics Data System (ADS)
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Data-driven non-Markovian closure models
NASA Astrophysics Data System (ADS)
Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael
2015-03-01
This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up.
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mohr, C.L.; Pankaskie, P.J.; Heasler, P.G.
Reactor fuel failure data sets in the form of initial power (P/sub i/), final power (P/sub f/), transient increase in power (..delta..P), and burnup (Bu) were obtained for pressurized heavy water reactors (PHWRs), boiling water reactors (BWRs), and pressurized water reactors (PWRs). These data sets were evaluated and used as the basis for developing two predictive fuel failure models, a graphical concept called the PCI-OGRAM, and a nonlinear regression based model called PROFIT. The PCI-OGRAM is an extension of the FUELOGRAM developed by AECL. It is based on a critical threshold concept for stress dependent stress corrosion cracking. The PROFITmore » model, developed at Pacific Northwest Laboratory, is the result of applying standard statistical regression methods to the available PCI fuel failure data and an analysis of the environmental and strain rate dependent stress-strain properties of the Zircaloy cladding.« less
NASA Technical Reports Server (NTRS)
Carder, K. L.; Lee, Z. P.; Marra, John; Steward, R. G.; Perry, M. J.
1995-01-01
The quantum yield of photosynthesis (mol C/mol photons) was calculated at six depths for the waters of the Marine Light-Mixed Layer (MLML) cruise of May 1991. As there were photosynthetically available radiation (PAR) but no spectral irradiance measurements for the primary production incubations, three ways are presented here for the calculation of the absorbed photons (AP) by phytoplankton for the purpose of calculating phi. The first is based on a simple, nonspectral model; the second is based on a nonlinear regression using measured PAR values with depth; and the third is derived through remote sensing measurements. We show that the results of phi calculated using the nonlinear regreesion method and those using remote sensing are in good agreement with each other, and are consistent with the reported values of other studies. In deep waters, however, the simple nonspectral model may cause quantum yield values much higher than theoretically possible.
Hot money and China's stock market volatility: Further evidence using the GARCH-MIDAS model
NASA Astrophysics Data System (ADS)
Wei, Yu; Yu, Qianwen; Liu, Jing; Cao, Yang
2018-02-01
This paper investigates the influence of hot money on the return and volatility of the Chinese stock market using a nonlinear Granger causality test and a new GARCH-class model based on mixed data sampling regression (GARCH-MIDAS). The empirical results suggest that no linear or nonlinear causality exists between the growth rate of hot money and the Chinese stock market return, implying that the Chinese stock market is not driven by hot money and vice versa. However, hot money has a significant positive impact on the long-term volatility of the Chinese stock market. Furthermore, the dependence between the long-term volatility caused by hot money and the total volatility of the Chinese stock market is time-variant, indicating that huge volatilities in the stock market are not always triggered by international speculation capital flow and that Chinese authorities should further focus on more systemic reforms in the trading rules and on effectively regulating the stock market.
Robust estimation for partially linear models with large-dimensional covariates
Zhu, LiPing; Li, RunZe; Cui, HengJian
2014-01-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087
A robust nonlinear filter for image restoration.
Koivunen, V
1995-01-01
A class of nonlinear regression filters based on robust estimation theory is introduced. The goal of the filtering is to recover a high-quality image from degraded observations. Models for desired image structures and contaminating processes are employed, but deviations from strict assumptions are allowed since the assumptions on signal and noise are typically only approximately true. The robustness of filters is usually addressed only in a distributional sense, i.e., the actual error distribution deviates from the nominal one. In this paper, the robustness is considered in a broad sense since the outliers may also be due to inappropriate signal model, or there may be more than one statistical population present in the processing window, causing biased estimates. Two filtering algorithms minimizing a least trimmed squares criterion are provided. The design of the filters is simple since no scale parameters or context-dependent threshold values are required. Experimental results using both real and simulated data are presented. The filters effectively attenuate both impulsive and nonimpulsive noise while recovering the signal structure and preserving interesting details.
Robust estimation for partially linear models with large-dimensional covariates.
Zhu, LiPing; Li, RunZe; Cui, HengJian
2013-10-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zhao, Zeng-hui; Wang, Wei-ming; Gao, Xin; Yan, Ji-xing
2013-01-01
According to the geological characteristics of Xinjiang Ili mine in western area of China, a physical model of interstratified strata composed of soft rock and hard coal seam was established. Selecting the tunnel position, deformation modulus, and strength parameters of each layer as influencing factors, the sensitivity coefficient of roadway deformation to each parameter was firstly analyzed based on a Mohr-Columb strain softening model and nonlinear elastic-plastic finite element analysis. Then the effect laws of influencing factors which showed high sensitivity were further discussed. Finally, a regression model for the relationship between roadway displacements and multifactors was obtained by equivalent linear regression under multiple factors. The results show that the roadway deformation is highly sensitive to the depth of coal seam under the floor which should be considered in the layout of coal roadway; deformation modulus and strength of coal seam and floor have a great influence on the global stability of tunnel; on the contrary, roadway deformation is not sensitive to the mechanical parameters of soft roof; roadway deformation under random combinations of multi-factors can be deduced by the regression model. These conclusions provide theoretical significance to the arrangement and stability maintenance of coal roadway. PMID:24459447
Dunham, J.B.; Cade, B.S.; Terrell, J.W.
2002-01-01
We used regression quantiles to model potentially limiting relationships between the standing crop of cutthroat trout Oncorhynchus clarki and measures of stream channel morphology. Regression quantile models indicated that variation in fish density was inversely related to the width:depth ratio of streams but not to stream width or depth alone. The spatial and temporal stability of model predictions were examined across years and streams, respectively. Variation in fish density with width:depth ratio (10th-90th regression quantiles) modeled for streams sampled in 1993-1997 predicted the variation observed in 1998-1999, indicating similar habitat relationships across years. Both linear and nonlinear models described the limiting relationships well, the latter performing slightly better. Although estimated relationships were transferable in time, results were strongly dependent on the influence of spatial variation in fish density among streams. Density changes with width:depth ratio in a single stream were responsible for the significant (P < 0.10) negative slopes estimated for the higher quantiles (>80th). This suggests that stream-scale factors other than width:depth ratio play a more direct role in determining population density. Much of the variation in densities of cutthroat trout among streams was attributed to the occurrence of nonnative brook trout Salvelinus fontinalis (a possible competitor) or connectivity to migratory habitats. Regression quantiles can be useful for estimating the effects of limiting factors when ecological responses are highly variable, but our results indicate that spatiotemporal variability in the data should be explicitly considered. In this study, data from individual streams and stream-specific characteristics (e.g., the occurrence of nonnative species and habitat connectivity) strongly affected our interpretation of the relationship between width:depth ratio and fish density.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blais, AR; Dekaban, M; Lee, T-Y
2014-08-15
Quantitative analysis of dynamic positron emission tomography (PET) data usually involves minimizing a cost function with nonlinear regression, wherein the choice of starting parameter values and the presence of local minima affect the bias and variability of the estimated kinetic parameters. These nonlinear methods can also require lengthy computation time, making them unsuitable for use in clinical settings. Kinetic modeling of PET aims to estimate the rate parameter k{sub 3}, which is the binding affinity of the tracer to a biological process of interest and is highly susceptible to noise inherent in PET image acquisition. We have developed linearized kineticmore » models for kinetic analysis of dynamic contrast enhanced computed tomography (DCE-CT)/PET imaging, including a 2-compartment model for DCE-CT and a 3-compartment model for PET. Use of kinetic parameters estimated from DCE-CT can stabilize the kinetic analysis of dynamic PET data, allowing for more robust estimation of k{sub 3}. Furthermore, these linearized models are solved with a non-negative least squares algorithm and together they provide other advantages including: 1) only one possible solution and they do not require a choice of starting parameter values, 2) parameter estimates are comparable in accuracy to those from nonlinear models, 3) significantly reduced computational time. Our simulated data show that when blood volume and permeability are estimated with DCE-CT, the bias of k{sub 3} estimation with our linearized model is 1.97 ± 38.5% for 1,000 runs with a signal-to-noise ratio of 10. In summary, we have developed a computationally efficient technique for accurate estimation of k{sub 3} from noisy dynamic PET data.« less
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Non-stationary hydrologic frequency analysis using B-spline quantile regression
NASA Astrophysics Data System (ADS)
Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.
2017-11-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.
MMB-4 Inhibition of Aceylcholinesterase Is Similar across Species
2014-11-01
version 5.4). An IC50 value was determined for AChE from each animal species by fitting the percent of AChE activity with respect to MMB 4 concentration...in GraphPad Prism (version 5) using a nonlinear regression dose response model for inhibition (normalized response with variable slope). Assessing the...Therefore, AChE activity and inhibition studies were carried out at 435 nm to reduce interference from MMB 4. Comparison of IC50 Values for MMB 4 with AChE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seong W. Lee
2004-10-01
The systematic tests of the gasifier simulator on the clean thermocouple were completed in this reporting period. Within the systematic tests on the clean thermocouple, five (5) factors were considered as the experimental parameters including air flow rate, water flow rate, fine dust particle amount, ammonia addition and high/low frequency device (electric motor). The fractional factorial design method was used in the experiment design with sixteen (16) data sets of readings. Analysis of Variances (ANOVA) was applied to the results from systematic tests. The ANOVA results show that the un-balanced motor vibration frequency did not have the significant impact onmore » the temperature changes in the gasifier simulator. For the fine dust particles testing, the amount of fine dust particles has significant impact to the temperature measurements in the gasifier simulator. The effects of the air and water on the temperature measurements show the same results as reported in the previous report. The ammonia concentration was included as an experimental parameter for the reducing environment in this reporting period. The ammonia concentration does not seem to be a significant factor on the temperature changes. The linear regression analysis was applied to the temperature reading with five (5) factors. The accuracy of the linear regression is relatively low, which is less than 10% accuracy. Nonlinear regression was also conducted to the temperature reading with the same factors. Since the experiments were designed in two (2) levels, the nonlinear regression is not very effective with the dataset (16 readings). An extra central point test was conducted. With the data of the center point testing, the accuracy of the nonlinear regression is much better than the linear regression.« less
Fast estimation of diffusion tensors under Rician noise by the EM algorithm.
Liu, Jia; Gasbarra, Dario; Railavo, Juha
2016-01-15
Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a fast computational method for maximum likelihood estimation (MLE) of diffusivities under the Rician noise model based on the expectation maximization (EM) algorithm. By using data augmentation, we are able to transform a non-linear regression problem into the generalized linear modeling framework, reducing dramatically the computational cost. The Fisher-scoring method is used for achieving fast convergence of the tensor parameter. The new method is implemented and applied using both synthetic and real data in a wide range of b-amplitudes up to 14,000s/mm(2). Higher accuracy and precision of the Rician estimates are achieved compared with other log-normal based methods. In addition, we extend the maximum likelihood (ML) framework to the maximum a posteriori (MAP) estimation in DTI under the aforementioned scheme by specifying the priors. We will describe how close numerically are the estimators of model parameters obtained through MLE and MAP estimation. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chang, Ni-Bin; Daranpob, Ammarin; Yang, Y. Jeffrey; Jin, Kang-Ren
2009-09-01
In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates might be empirical regression, neural network model, support vector machine, genetic algorithm/genetic programming, analytical equation, etc. This paper compares three types of data mining techniques, including multiple non-linear regression, artificial neural networks, and genetic programming, for estimating multi-temporal turbidity changes following hurricane events at Lake Okeechobee, Florida. This retrospective analysis aims to identify how the major hurricanes impacted the water quality management in 2003-2004. The Moderate Resolution Imaging Spectroradiometer (MODIS) Terra 8-day composite imageries were used to retrieve the spatial patterns of turbidity distributions for comparison against the visual patterns discernible in the in-situ observations. By evaluating four statistical parameters, the genetic programming model was finally selected as the most suitable data mining tool for classification in which the MODIS band 1 image and wind speed were recognized as the major determinants by the model. The multi-temporal turbidity maps generated before and after the major hurricane events in 2003-2004 showed that turbidity levels were substantially higher after hurricane episodes. The spatial patterns of turbidity confirm that sediment-laden water travels to the shore where it reduces the intensity of the light necessary to submerged plants for photosynthesis. This reduction results in substantial loss of biomass during the post-hurricane period.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Bayesian structured additive regression modeling of epidemic data: application to cholera
2012-01-01
Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics. PMID:22866662
Bramness, Jørgen G; Walby, Fredrik A; Morken, Gunnar; Røislien, Jo
2015-08-01
Seasonal variation in the number of suicides has long been acknowledged. It has been suggested that this seasonality has declined in recent years, but studies have generally used statistical methods incapable of confirming this. We examined all suicides occurring in Norway during 1969-2007 (more than 20,000 suicides in total) to establish whether seasonality decreased over time. Fitting of additive Fourier Poisson time-series regression models allowed for formal testing of a possible linear decrease in seasonality, or a reduction at a specific point in time, while adjusting for a possible smooth nonlinear long-term change without having to categorize time into discrete yearly units. The models were compared using Akaike's Information Criterion and analysis of variance. A model with a seasonal pattern was significantly superior to a model without one. There was a reduction in seasonality during the period. Both the model assuming a linear decrease in seasonality and the model assuming a change at a specific point in time were both superior to a model assuming constant seasonality, thus confirming by formal statistical testing that the magnitude of the seasonality in suicides has diminished. The additive Fourier Poisson time-series regression model would also be useful for studying other temporal phenomena with seasonal components. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Empirical and semi-analytical models for predicting peak outflows caused by embankment dam failures
NASA Astrophysics Data System (ADS)
Wang, Bo; Chen, Yunliang; Wu, Chao; Peng, Yong; Song, Jiajun; Liu, Wenjun; Liu, Xin
2018-07-01
Prediction of peak discharge of floods has attracted great attention for researchers and engineers. In present study, nine typical nonlinear mathematical models are established based on database of 40 historical dam failures. The first eight models that were developed with a series of regression analyses are purely empirical, while the last one is a semi-analytical approach that was derived from an analytical solution of dam-break floods in a trapezoidal channel. Water depth above breach invert (Hw), volume of water stored above breach invert (Vw), embankment length (El), and average embankment width (Ew) are used as independent variables to develop empirical formulas of estimating the peak outflow from breached embankment dams. It is indicated from the multiple regression analysis that a function using the former two variables (i.e., Hw and Vw) produce considerably more accurate results than that using latter two variables (i.e., El and Ew). It is shown that the semi-analytical approach works best in terms of both prediction accuracy and uncertainty, and the established empirical models produce considerably reasonable results except the model only using El. Moreover, present models have been compared with other models available in literature for estimating peak discharge.
NASA Astrophysics Data System (ADS)
Ermakov, Ilya; Crucifix, Michel; Munhoven, Guy
2013-04-01
Complex climate models require high computational burden. However, computational limitations may be avoided by using emulators. In this work we present several approaches for dynamical emulation (also called metamodelling) of the Multi-Box Model (MBM) coupled to the Model of Early Diagenesis in the Upper Sediment A (MEDUSA) that simulates the carbon cycle of the ocean and atmosphere [1]. We consider two experiments performed on the MBM-MEDUSA that explore the Basin-to-Shelf Transfer (BST) dynamics. In both experiments the sea level is varied according to a paleo sea level reconstruction. Such experiments are interesting because the BST is an important cause of the CO2 variation and the dynamics is potentially nonlinear. The output that we are interested in is the variation of the carbon dioxide partial pressure in the atmosphere over the Pleistocene. The first experiment considers that the BST is fixed constant during the simulation. In the second experiment the BST is interactively adjusted according to the sea level, since the sea level is the primary control of the growth and decay of coral reefs and other shelf carbon reservoirs. The main aim of the present contribution is to create a metamodel of the MBM-MEDUSA using the Dynamic Emulation Modelling methodology [2] and compare the results obtained using linear and non-linear methods. The first step in the emulation methodology used in this work is to identify the structure of the metamodel. In order to select an optimal approach for emulation we compare the results of identification obtained by the simple linear and more complex nonlinear models. In order to identify the metamodel in the first experiment the simple linear regression and the least-squares method is sufficient to obtain a 99,9% fit between the temporal outputs of the model and the metamodel. For the second experiment the MBM's output is highly nonlinear. In this case we apply nonlinear models, such as, NARX, Hammerstein model, and an 'ad-hoc' switching model. After the identification we perform the parameter mapping using spline interpolation and validate the emulator on a new set of parameters. References: [1] G. Munhoven, "Glacial-interglacial rain ratio changes: Implications for atmospheric CO2 and ocean-sediment interaction," Deep-Sea Res Pt II, vol. 54, pp. 722-746, 2007. [2] A. Castelletti et al., "A general framework for Dynamic Emulation Modelling in environmental problems," Environ Modell Softw, vol. 34, pp. 5-18, 2012.
NASA Astrophysics Data System (ADS)
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-03-01
Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.
Forecasting space weather: Can new econometric methods improve accuracy?
NASA Astrophysics Data System (ADS)
Reikard, Gordon
2011-06-01
Space weather forecasts are currently used in areas ranging from navigation and communication to electric power system operations. The relevant forecast horizons can range from as little as 24 h to several days. This paper analyzes the predictability of two major space weather measures using new time series methods, many of them derived from econometrics. The data sets are the A p geomagnetic index and the solar radio flux at 10.7 cm. The methods tested include nonlinear regressions, neural networks, frequency domain algorithms, GARCH models (which utilize the residual variance), state transition models, and models that combine elements of several techniques. While combined models are complex, they can be programmed using modern statistical software. The data frequency is daily, and forecasting experiments are run over horizons ranging from 1 to 7 days. Two major conclusions stand out. First, the frequency domain method forecasts the A p index more accurately than any time domain model, including both regressions and neural networks. This finding is very robust, and holds for all forecast horizons. Combining the frequency domain method with other techniques yields a further small improvement in accuracy. Second, the neural network forecasts the solar flux more accurately than any other method, although at short horizons (2 days or less) the regression and net yield similar results. The neural net does best when it includes measures of the long-term component in the data.
2009-01-01
Background The International Commission on Radiological Protection (ICRP) recommended annual occupational dose limit is 20 mSv. Cancer mortality in Japanese A-bomb survivors exposed to less than 20 mSv external radiation in 1945 was analysed previously, using a latency model with non-linear dose response. Questions were raised regarding statistical inference with this model. Methods Cancers with over 100 deaths in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study are analysed with Poisson regression models incorporating latency, allowing linear and non-linear dose response. Bootstrap percentile and Bias-corrected accelerated (BCa) methods and simulation of the Likelihood Ratio Test lead to Confidence Intervals for Excess Relative Risk (ERR) and tests against the linear model. Results The linear model shows significant large, positive values of ERR for liver and urinary cancers at latencies from 37 - 43 years. Dose response below 20 mSv is strongly non-linear at the optimal latencies for the stomach (11.89 years), liver (36.9), lung (13.6), leukaemia (23.66), and pancreas (11.86) and across broad latency ranges. Confidence Intervals for ERR are comparable using Bootstrap and Likelihood Ratio Test methods and BCa 95% Confidence Intervals are strictly positive across latency ranges for all 5 cancers. Similar risk estimates for 10 mSv (lagged dose) are obtained from the 0 - 20 mSv and 5 - 500 mSv data for the stomach, liver, lung and leukaemia. Dose response for the latter 3 cancers is significantly non-linear in the 5 - 500 mSv range. Conclusion Liver and urinary cancer mortality risk is significantly raised using a latency model with linear dose response. A non-linear model is strongly superior for the stomach, liver, lung, pancreas and leukaemia. Bootstrap and Likelihood-based confidence intervals are broadly comparable and ERR is strictly positive by bootstrap methods for all 5 cancers. Except for the pancreas, similar estimates of latency and risk from 10 mSv are obtained from the 0 - 20 mSv and 5 - 500 mSv subcohorts. Large and significant cancer risks for Japanese survivors exposed to less than 20 mSv external radiation from the atomic bombs in 1945 cast doubt on the ICRP recommended annual occupational dose limit. PMID:20003238
[Prediction of schistosomiasis infection rates of population based on ARIMA-NARNN model].
Ke-Wei, Wang; Yu, Wu; Jin-Ping, Li; Yu-Yu, Jiang
2016-07-12
To explore the effect of the autoregressive integrated moving average model-nonlinear auto-regressive neural network (ARIMA-NARNN) model on predicting schistosomiasis infection rates of population. The ARIMA model, NARNN model and ARIMA-NARNN model were established based on monthly schistosomiasis infection rates from January 2005 to February 2015 in Jiangsu Province, China. The fitting and prediction performances of the three models were compared. Compared to the ARIMA model and NARNN model, the mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) of the ARIMA-NARNN model were the least with the values of 0.011 1, 0.090 0 and 0.282 4, respectively. The ARIMA-NARNN model could effectively fit and predict schistosomiasis infection rates of population, which might have a great application value for the prevention and control of schistosomiasis.
NASA Astrophysics Data System (ADS)
Koga-Vicente, A.; Friedel, M. J.
2010-12-01
Every year thousands of people are affected by floods and landslide hazards caused by rainstorms. The problem is more serious in tropical developing countries because of the susceptibility as a result of the high amount of available energy to form storms, and the high vulnerability due to poor economic and social conditions. Predictive models of hazards are important tools to manage this kind of risk. In this study, a comparison of two different modeling approaches was made for predicting hydrometeorological hazards in 12 cities on the coast of São Paulo, Brazil, from 1994 to 2003. In the first approach, an empirical multiple linear regression (MLR) model was developed and used; the second approach used a type of unsupervised nonlinear artificial neural network called a self-organized map (SOM). By using twenty three independent variables of susceptibility (precipitation, soil type, slope, elevation, and regional atmospheric system scale) and vulnerability (distribution and total population, income and educational characteristics, poverty intensity, human development index), binary hazard responses were obtained. Model performance by cross-validation indicated that the respective MLR and SOM model accuracy was about 67% and 80%. Prediction accuracy can be improved by the addition of information, but the SOM approach is preferred because of sparse data and highly nonlinear relations among the independent variables.
NASA Astrophysics Data System (ADS)
Krishnanathan, Kirubhakaran; Anderson, Sean R.; Billings, Stephen A.; Kadirkamanathan, Visakan
2016-11-01
In this paper, we derive a system identification framework for continuous-time nonlinear systems, for the first time using a simulation-focused computational Bayesian approach. Simulation approaches to nonlinear system identification have been shown to outperform regression methods under certain conditions, such as non-persistently exciting inputs and fast-sampling. We use the approximate Bayesian computation (ABC) algorithm to perform simulation-based inference of model parameters. The framework has the following main advantages: (1) parameter distributions are intrinsically generated, giving the user a clear description of uncertainty, (2) the simulation approach avoids the difficult problem of estimating signal derivatives as is common with other continuous-time methods, and (3) as noted above, the simulation approach improves identification under conditions of non-persistently exciting inputs and fast-sampling. Term selection is performed by judging parameter significance using parameter distributions that are intrinsically generated as part of the ABC procedure. The results from a numerical example demonstrate that the method performs well in noisy scenarios, especially in comparison to competing techniques that rely on signal derivative estimation.
Chirombo, James; Lowe, Rachel; Kazembe, Lawrence
2014-01-01
Background After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. Methods We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Results Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. Conclusions The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities. PMID:24991915
Chirombo, James; Lowe, Rachel; Kazembe, Lawrence
2014-01-01
After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities.
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the multivariate adaptive regression splines (MARS) technique, conclusions of this research work are exposed.
Pomerantsev, Alexey L; Kutsenova, Alla V; Rodionova, Oxana Ye
2017-02-01
A novel non-linear regression method for modeling non-isothermal thermogravimetric data is proposed. Experiments for several heating rates are analyzed simultaneously. The method is applicable to complex multi-stage processes when the number of stages is unknown. Prior knowledge of the type of kinetics is not required. The main idea is a consequent estimation of parameters when the overall model is successively changed from one level of modeling to another. At the first level, the Avrami-Erofeev functions are used. At the second level, the Sestak-Berggren functions are employed with the goal to broaden the overall model. The method is tested using both simulated and real-world data. A comparison of the proposed method with a recently published 'model-free' deconvolution method is presented.
Using multiple linear regression model to estimate thunderstorm activity
NASA Astrophysics Data System (ADS)
Suparta, W.; Putro, W. S.
2017-03-01
This paper is aimed to develop a numerical model with the use of a nonlinear model to estimate the thunderstorm activity. Meteorological data such as Pressure (P), Temperature (T), Relative Humidity (H), cloud (C), Precipitable Water Vapor (PWV), and precipitation on a daily basis were used in the proposed method. The model was constructed with six configurations of input and one target output. The output tested in this work is the thunderstorm event when one-year data is used. Results showed that the model works well in estimating thunderstorm activities with the maximum epoch reaching 1000 iterations and the percent error was found below 50%. The model also found that the thunderstorm activities in May and October are detected higher than the other months due to the inter-monsoon season.
Prediction of Battery Life and Behavior from Analysis of Voltage Data
NASA Technical Reports Server (NTRS)
Mcdermott, P. P.
1984-01-01
A method for simulating charge and discharge characteristics of secondary batteries is discussed. The analysis utilizes a nonlinear regression technique where empirical data is computer fitted with a five coefficient nonlinear equation. The equations for charge and discharge voltage are identical except for a change of sign before the second and third terms.
Temperature-viscosity models reassessed.
Peleg, Micha
2017-05-04
The temperature effect on viscosity of liquid and semi-liquid foods has been traditionally described by the Arrhenius equation, a few other mathematical models, and more recently by the WLF and VTF (or VFT) equations. The essence of the Arrhenius equation is that the viscosity is proportional to the absolute temperature's reciprocal and governed by a single parameter, namely, the energy of activation. However, if the absolute temperature in K in the Arrhenius equation is replaced by T + b where both T and the adjustable b are in °C, the result is a two-parameter model, which has superior fit to experimental viscosity-temperature data. This modified version of the Arrhenius equation is also mathematically equal to the WLF and VTF equations, which are known to be equal to each other. Thus, despite their dissimilar appearances all three equations are essentially the same model, and when used to fit experimental temperature-viscosity data render exactly the same very high regression coefficient. It is shown that three new hybrid two-parameter mathematical models, whose formulation bears little resemblance to any of the conventional models, can also have excellent fit with r 2 ∼ 1. This is demonstrated by comparing the various models' regression coefficients to published viscosity-temperature relationships of 40% sucrose solution, soybean oil, and 70°Bx pear juice concentrate at different temperature ranges. Also compared are reconstructed temperature-viscosity curves using parameters calculated directly from 2 or 3 data points and fitted curves obtained by nonlinear regression using a larger number of experimental viscosity measurements.
Iler, Amy M; Høye, Toke T; Inouye, David W; Schmidt, Niels M
2013-08-19
Many alpine and subalpine plant species exhibit phenological advancements in association with earlier snowmelt. While the phenology of some plant species does not advance beyond a threshold snowmelt date, the prevalence of such threshold phenological responses within plant communities is largely unknown. We therefore examined the shape of flowering phenology responses (linear versus nonlinear) to climate using two long-term datasets from plant communities in snow-dominated environments: Gothic, CO, USA (1974-2011) and Zackenberg, Greenland (1996-2011). For a total of 64 species, we determined whether a linear or nonlinear regression model best explained interannual variation in flowering phenology in response to increasing temperatures and advancing snowmelt dates. The most common nonlinear trend was for species to flower earlier as snowmelt advanced, with either no change or a slower rate of change when snowmelt was early (average 20% of cases). By contrast, some species advanced their flowering at a faster rate over the warmest temperatures relative to cooler temperatures (average 5% of cases). Thus, some species seem to be approaching their limits of phenological change in response to snowmelt but not temperature. Such phenological thresholds could either be a result of minimum springtime photoperiod cues for flowering or a slower rate of adaptive change in flowering time relative to changing climatic conditions.
NASA Technical Reports Server (NTRS)
Cagliostro, Domenick E.; Riccitiello, Salvatore R.
1993-01-01
In the first part of this work, a model is developed for the deposition of silicon from the reduction of silicon tetrachloride with hydrogen in a tubular reactor at 700-1100 C, at atmospheric pressure. The model is based on gas chromatography of the volatile products of the reaction, followed by gravimetric analysis of total Si deposition on the tube. In the second part of this work, a model is developed for the case of SiC deposition from the pyrolysis of dichlorodimethylsilane in hydrogen under the same reactor conditions. The rate constants derived from a nonlinear regression analysis are reported.
Addressing the unemployment-mortality conundrum: non-linearity is the answer.
Bonamore, Giorgio; Carmignani, Fabrizio; Colombo, Emilio
2015-02-01
The effect of unemployment on mortality is the object of a lively literature. However, this literature is characterized by sharply conflicting results. We revisit this issue and suggest that the relationship might be non-linear. We use data for 265 territorial units (regions) within 23 European countries over the period 2000-2012 to estimate a multivariate regression of mortality. The estimating equation allows for a quadratic relationship between unemployment and mortality. We control for various other determinants of mortality at regional and national level and we include region-specific and time-specific fixed effects. The model is also extended to account for the dynamic adjustment of mortality and possible lagged effects of unemployment. We find that the relationship between mortality and unemployment is U shaped. In the benchmark regression, when the unemployment rate is low, at 3%, an increase by one percentage point decreases average mortality by 0.7%. As unemployment increases, the effect decays: when the unemployment rate is 8% (sample average) a further increase by one percentage point decreases average mortality by 0.4%. The effect changes sign, turning from negative to positive, when unemployment is around 17%. When the unemployment rate is 25%, a further increase by one percentage point raises average mortality by 0.4%. Results hold for different causes of death and across different specifications of the estimating equation. We argue that the non-linearity arises because the level of unemployment affects the psychological and behavioural response of individuals to worsening economic conditions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide
2009-01-01
The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.
Modeling Source Water TOC Using Hydroclimate Variables and Local Polynomial Regression.
Samson, Carleigh C; Rajagopalan, Balaji; Summers, R Scott
2016-04-19
To control disinfection byproduct (DBP) formation in drinking water, an understanding of the source water total organic carbon (TOC) concentration variability can be critical. Previously, TOC concentrations in water treatment plant source waters have been modeled using streamflow data. However, the lack of streamflow data or unimpaired flow scenarios makes it difficult to model TOC. In addition, TOC variability under climate change further exacerbates the problem. Here we proposed a modeling approach based on local polynomial regression that uses climate, e.g. temperature, and land surface, e.g., soil moisture, variables as predictors of TOC concentration, obviating the need for streamflow. The local polynomial approach has the ability to capture non-Gaussian and nonlinear features that might be present in the relationships. The utility of the methodology is demonstrated using source water quality and climate data in three case study locations with surface source waters including river and reservoir sources. The models show good predictive skill in general at these locations, with lower skills at locations with the most anthropogenic influences in their streams. Source water TOC predictive models can provide water treatment utilities important information for making treatment decisions for DBP regulation compliance under future climate scenarios.
Active Learning to Understand Infectious Disease Models and Improve Policy Making
Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-01-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings. PMID:24743387
Active learning to understand infectious disease models and improve policy making.
Willem, Lander; Stijven, Sean; Vladislavleva, Ekaterina; Broeckhove, Jan; Beutels, Philippe; Hens, Niel
2014-04-01
Modeling plays a major role in policy making, especially for infectious disease interventions but such models can be complex and computationally intensive. A more systematic exploration is needed to gain a thorough systems understanding. We present an active learning approach based on machine learning techniques as iterative surrogate modeling and model-guided experimentation to systematically analyze both common and edge manifestations of complex model runs. Symbolic regression is used for nonlinear response surface modeling with automatic feature selection. First, we illustrate our approach using an individual-based model for influenza vaccination. After optimizing the parameter space, we observe an inverse relationship between vaccination coverage and cumulative attack rate reinforced by herd immunity. Second, we demonstrate the use of surrogate modeling techniques on input-response data from a deterministic dynamic model, which was designed to explore the cost-effectiveness of varicella-zoster virus vaccination. We use symbolic regression to handle high dimensionality and correlated inputs and to identify the most influential variables. Provided insight is used to focus research, reduce dimensionality and decrease decision uncertainty. We conclude that active learning is needed to fully understand complex systems behavior. Surrogate models can be readily explored at no computational expense, and can also be used as emulator to improve rapid policy making in various settings.
Buchwald, Peter
2017-06-01
A generalized model of receptor function is proposed that relies on the essential assumptions of the minimal two-state receptor theory (i.e., ligand binding followed by receptor activation), but uses a different parametrization and allows nonlinear response (transduction) for possible signal amplification. For the most general case, three parameters are used: K d , the classic equilibrium dissociation constant to characterize binding affinity; ε , an intrinsic efficacy to characterize the ability of the bound ligand to activate the receptor (ranging from 0 for an antagonist to 1 for a full agonist); and γ , a gain (amplification) parameter to characterize the nonlinearity of postactivation signal transduction (ranging from 1 for no amplification to infinity). The obtained equation, E/Emax=εγLεγ+1-εL+Kd, resembles that of the operational (Black and Leff) or minimal two-state (del Castillo-Katz) models, E/Emax=τLτ+1L+Kd, with εγ playing a role somewhat similar to that of the τ efficacy parameter of those models, but has several advantages. Its parameters are more intuitive as they are conceptually clearly related to the different steps of binding, activation, and signal transduction (amplification), and they are also better suited for optimization by nonlinear regression. It allows fitting of complex data where receptor binding and response are measured separately and the fractional occupancy and response are mismatched. Unlike the previous models, it is a true generalized model as simplified forms can be reproduced with special cases of its parameters. Such simplified forms can be used on their own to characterize partial agonism, competing partial and full agonists, or signal amplification.
Nonlinear and threshold-dominated runoff generation controls DOC export in a small peat catchment
NASA Astrophysics Data System (ADS)
Birkel, C.; Broder, T.; Biester, H.
2017-03-01
We used a relatively simple two-layer, coupled hydrology-biogeochemistry model to simultaneously simulate streamflow and stream dissolved organic carbon (DOC) concentrations in a small lead and arsenic contaminated upland peat catchment in northwestern Germany. The model procedure was informed by an initial data mining analysis, in combination with regression relationships of discharge, DOC, and element export. We assessed the internal model DOC processing based on stream DOC hysteresis patterns and 3-hourly time step groundwater level and soil DOC data for two consecutive summer periods in 2013 and 2014. The parsimonious model (i.e., few calibrated parameters) showed the importance of nonlinear and rapid near-surface runoff generation mechanisms that caused around 60% of simulated DOC load. The total load was high even though these pathways were only activated during storm events on average 30% of the monitoring time—as also shown by the experimental data. Overall, the drier period 2013 resulted in increased nonlinearity but exported less DOC (115 kg C ha-1 yr-1 ± 11 kg C ha-1 yr-1) compared to the equivalent but wetter period in 2014 (189 kg C ha-1 yr-1 ± 38 kg C ha-1 yr-1). The exceedance of a critical water table threshold (-10 cm) triggered a rapid near-surface runoff response with associated higher DOC transport connecting all available DOC pools and subsequent dilution. We conclude that the combination of detailed experimental work with relatively simple, coupled hydrology-biogeochemistry models not only allowed the model to be internally constrained but also provided important insight into how DOC and tightly coupled pollutants or trace elements are mobilized.
A comparison of several methods of solving nonlinear regression groundwater flow problems
Cooley, Richard L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems. The fastest method per iteration was the Fletcher-Reeves method, and this was followed closely by the quasi-Newton method. The Marquardt and quasi-linearization methods were slower. For all four methods the speed per iteration was directly related to the number of parameters in the model. However, this effect was much more pronounced for the Marquardt and quasi-linearization methods than for the other two. Hence the quasi-Newton (and perhaps Fletcher-Reeves) method might be more efficient than either the Marquardt or quasi-linearization methods if the number of parameters in a particular model were large, although this remains to be proven. The Marquardt method required somewhat less central memory than the quasi-linearization metilod for three of the four problems. For all four problems the quasi-Newton method required roughly two thirds to three quarters of the memory required by the Marquardt method, and the Fletcher-Reeves method required slightly less memory than the quasi-Newton method. Memory requirements were not excessive for any of the four methods.
NASA Technical Reports Server (NTRS)
Butera, M. K.; Frick, A. L.; Browder, J.
1983-01-01
NASA and the U.S. National Marine Fisheries Service have undertaken the development of Landsat Thematic Mapper (TM) technology for the evaluation of the usefulness of wetlands to estuarine fish and shellfish production. Toward this end, a remote sensing-based Productive Capacity model has been developed which characterizes the biological and hydrographic features of a Gulf Coast Marsh to predict detrital export. Regression analyses of TM simulator data for wetland plant production estimation are noted to more accurately estimate the percent of total vegetative cover than biomass, indicating that a nonlinear relationship may be involved.
Wilke, Marko
2018-02-01
This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J
2015-12-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.
2016-01-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
LaBeau, Meredith B.; Mayer, Alex S.; Griffis, Veronica; Watkins, David Jr.; Robertson, Dale M.; Gyawali, Rabi
2015-01-01
In this work, we hypothesize that phosphorus (P) concentrations in streams vary seasonally and with streamflow and that it is important to incorporate this variation when predicting changes in P loading associated with climate change. Our study area includes 14 watersheds with a range of land uses throughout the U.S. Great Lakes Basin. We develop annual seasonal load-discharge regression models for each watershed and apply these models with simulated discharges generated for future climate scenarios to simulate future P loading patterns for two periods: 2046–2065 and 2081–2100. We utilize output from the Coupled Model Intercomparison Project phase 3 downscaled climate change projections that are input into the Large Basin Runoff Model to generate future discharge scenarios, which are in turn used as inputs to the seasonal P load regression models. In almost all cases, the seasonal load-discharge models match observed loads better than the annual models. Results using the seasonal models show that the concurrence of nonlinearity in the load-discharge model and changes in high discharges in the spring months leads to the most significant changes in P loading for selected tributaries under future climate projections. These results emphasize the importance of using seasonal models to understand the effects of future climate change on nutrient loads.
Machine learning approaches to the social determinants of health in the health and retirement study.
Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David
2018-04-01
Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.
Stand age and climate drive forest carbon balance recovery
NASA Astrophysics Data System (ADS)
Besnard, Simon; Carvalhais, Nuno; Clevers, Jan; Herold, Martin; Jung, Martin; Reichstein, Markus
2016-04-01
Forests play an essential role in the terrestrial carbon (C) cycle, especially in the C exchanges between the terrestrial biosphere and the atmosphere. Ecological disturbances and forest management are drivers of forest dynamics and strongly impact the forest C budget. However, there is a lack of knowledge on the exogenous and endogenous factors driving forest C recovery. Our analysis includes 68 forest sites in different climate zones to determine the relative influence of stand age and climate conditions on the forest carbon balance recovery. In this study, we only included forest regrowth after clear-cut stand replacement (e.g. harvest, fire), and afforestation/reforestation processes. We synthesized net ecosystem production (NEP), gross primary production (GPP), ecosystem respiration (Re), the photosynthetic respiratory ratio (GPP to Re ratio), the ecosystem carbon use efficiency (CUE), that is NEP to GPP ratio, and CUEclimax, where GPP is derived from the climate conditions. We implemented a non-linear regression analysis in order to identify the best model representing the C flux patterns with stand age. Furthermore, we showed that each C flux have a non-linear relationship with stand age, annual precipitation (P) and mean annual temperature (MAT), therefore, we proposed to use non-linear transformations of the covariates for C fluxes'estimates. Non-linear stand age and climate models were, therefore, used to establish multiple linear regressions for C flux predictions and for determining the contribution of stand age and climate in forest carbon recovery. Our findings depicted that a coupled stand age-climate model explained 33% (44%, average site), 62% (76%, average site), 56% (71%, average site), 41% (59%, average site), 50% (65%, average site) and 36% (50%, average site) of the variance of annual NEP, GPP, Re, photosynthetic respiratory ratio, CUE and CUEclimax across sites, respectively. In addition, we showed that gross fluxes (e.g. GPP and Re) are mainly climatically driven with 54.2% (68.4%, average site) and 54.1% (71.0%, average site) of GPP and Re variability, respectively, explained by the sum of MAT and P. However, annual NEP, GPP to Re ratio and CUEclimax are affected by both forest stand age and climate conditions, in particular MAT. The key result is that forest stand age plays a crucial role in determining CUE (36.4% and 48.2% for all years per site and average site, respectively), while climate conditions have less effect on CUE (13.6% and 15.4% for all years per site and average site, respectively). These findings are relevant for the implementation of Earth system models and imply that information both on forest stand age and climate conditions are critical to improve the accuracy of global terrestrial C models's estimates.
Lin, Ping; Chen, Yong-ming; Yao, Zhi-lei
2015-11-01
A novel method of combination of the chemometrics and the hyperspectral imaging techniques was presented to detect the temperatures of Ethylene-Vinyl Acetate copolymer (EVA) films in photovoltaic cells during the thermal encapsulation process. Four varieties of the EVA films which had been heated at the temperatures of 128, 132, 142 and 148 °C during the photovoltaic cells production process were used for investigation in this paper. These copolymer encapsulation films were firstly scanned by the hyperspectral imaging equipment (Spectral Imaging Ltd. Oulu, Finland). The scanning band range of hyperspectral equipemnt was set between 904.58 and 1700.01 nm. The hyperspectral dataset of copolymer films was randomly divided into two parts for the training and test purpose. Each type of the training set and test set contained 90 and 10 instances, respectively. The obtained hyperspectral images of EVA films were dealt with by using the ENVI (Exelis Visual Information Solutions, USA) software. The size of region of interest (ROI) of each obtained hyperspectral image of EVA film was set as 150 x 150 pixels. The average of reflectance hyper spectra of all the pixels in the ROI was used as the characteristic curve to represent the instance. There kinds of chemometrics methods including partial least squares regression (PLSR), multi-class support vector machine (SVM) and large margin nearest neighbor (LMNN) were used to correlate the characteristic hyper spectra with the encapsulation temperatures of of copolymer films. The plot of weighted regression coefficients illustrated that both bands of short- and long-wave near infrared hyperspectral data contributed to enhancing the prediction accuracy of the forecast model. Because the attained reflectance hyperspectral data of EVA materials displayed the strong nonlinearity, the prediction performance of linear modeling method of PLSR declined and the prediction precision only reached to 95%. The kernel-based forecast models were introduced to eliminate the impact of nonlinear hyperspectral data to some extent through mapping the original nonlinear hyperspectral data to the high dimensional linear feature space, so the relationship between the nonlinear hyperspectral data and the encapsulation temperatures of EVA films was fully disclosed finally. Compared with the prediction results of three proposed models, the prediction performance of LMNN was superior to the other two, whose final recognition accuracy achieved 100%. The results indicated that the methods of combination of LMNN model with the hyperspectral imaging techniques was the best one for accurately and rapidly determining the encapsulation temperatures of EVA films of photovoltaic cells. In addition, this paper had created the ideal conditions for automatically monitoring and effectively controlling the encapsulation temperatures of EVA films in the photovoltaic cells production process.
Data-driven discovery of partial differential equations
Rudy, Samuel H.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan
2017-01-01
We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg–de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable. PMID:28508044
An Integrated Analysis of the Physiological Effects of Space Flight: Executive Summary
NASA Technical Reports Server (NTRS)
Leonard, J. I.
1985-01-01
A large array of models were applied in a unified manner to solve problems in space flight physiology. Mathematical simulation was used as an alternative way of looking at physiological systems and maximizing the yield from previous space flight experiments. A medical data analysis system was created which consist of an automated data base, a computerized biostatistical and data analysis system, and a set of simulation models of physiological systems. Five basic models were employed: (1) a pulsatile cardiovascular model; (2) a respiratory model; (3) a thermoregulatory model; (4) a circulatory, fluid, and electrolyte balance model; and (5) an erythropoiesis regulatory model. Algorithms were provided to perform routine statistical tests, multivariate analysis, nonlinear regression analysis, and autocorrelation analysis. Special purpose programs were prepared for rank correlation, factor analysis, and the integration of the metabolic balance data.
A mathematical model for ethanol fermentation from oil palm trunk sap using Saccharomyces cerevisiae
NASA Astrophysics Data System (ADS)
Sultana, S.; Jamil, Norazaliza Mohd; Saleh, E. A. M.; Yousuf, A.; Faizal, Che Ku M.
2017-09-01
This paper presents a mathematical model and solution strategy of ethanol fermentation for oil palm trunk (OPT) sap by considering the effect of substrate limitation, substrate inhibition product inhibition and cell death. To investigate the effect of cell death rate on the fermentation process we extended and improved the current mathematical model. The kinetic parameters of the model were determined by nonlinear regression using maximum likelihood function. The temporal profiles of sugar, cell and ethanol concentrations were modelled by a set of ordinary differential equations, which were solved numerically by the 4th order Runge-Kutta method. The model was validated by the experimental data and the agreement between the model and experimental results demonstrates that the model is reasonable for prediction of the dynamic behaviour of the fermentation process.
Determinants of Rotavirus Transmission: A Lag Nonlinear Time Series Analysis.
van Gaalen, Rolina D; van de Kassteele, Jan; Hahné, Susan J M; Bruijning-Verhagen, Patricia; Wallinga, Jacco
2017-07-01
Rotavirus is a common viral infection among young children. As in many countries, the infection dynamics of rotavirus in the Netherlands are characterized by an annual winter peak, which was notably low in 2014. Previous study suggested an association between weather factors and both rotavirus transmission and incidence. From epidemic theory, we know that the proportion of susceptible individuals can affect disease transmission. We investigated how these factors are associated with rotavirus transmission in the Netherlands, and their impact on rotavirus transmission in 2014. We used available data on birth rates and rotavirus laboratory reports to estimate rotavirus transmission and the proportion of individuals susceptible to primary infection. Weather data were directly available from a central meteorological station. We developed an approach for detecting determinants of seasonal rotavirus transmission by assessing nonlinear, delayed associations between each factor and rotavirus transmission. We explored relationships by applying a distributed lag nonlinear regression model with seasonal terms. We corrected for residual serial correlation using autoregressive moving average errors. We inferred the relationship between different factors and the effective reproduction number from the most parsimonious model with low residual autocorrelation. Higher proportions of susceptible individuals and lower temperatures were associated with increases in rotavirus transmission. For 2014, our findings suggest that relatively mild temperatures combined with the low proportion of susceptible individuals contributed to lower rotavirus transmission in the Netherlands. However, our model, which overestimated the magnitude of the peak, suggested that other factors were likely instrumental in reducing the incidence that year.
The role of building models in the evaluation of heat-related risks
NASA Astrophysics Data System (ADS)
Buchin, Oliver; Jänicke, Britta; Meier, Fred; Scherer, Dieter; Ziegler, Felix
2016-04-01
Hazard-risk relationships in epidemiological studies are generally based on the outdoor climate, despite the fact that most of humans' lifetime is spent indoors. By coupling indoor and outdoor climates with a building model, the risk concept developed can still be based on the outdoor conditions but also includes exposure to the indoor climate. The influence of non-linear building physics and the impact of air conditioning on heat-related risks can be assessed in a plausible manner using this risk concept. For proof of concept, the proposed risk concept is compared to a traditional risk analysis. As an example, daily and city-wide mortality data of the age group 65 and older in Berlin, Germany, for the years 2001-2010 are used. Four building models with differing complexity are applied in a time-series regression analysis. This study shows that indoor hazard better explains the variability in the risk data compared to outdoor hazard, depending on the kind of building model. Simplified parameter models include the main non-linear effects and are proposed for the time-series analysis. The concept shows that the definitions of heat events, lag days, and acclimatization in a traditional hazard-risk relationship are influenced by the characteristics of the prevailing building stock.
Modeling of time trends and interactions in vital rates using restricted regression splines.
Heuer, C
1997-03-01
For the analysis of time trends in incidence and mortality rates, the age-period-cohort (apc) model has became a widely accepted method. The considered data are arranged in a two-way table by age group and calendar period, which are mostly subdivided into 5- or 10-year intervals. The disadvantage of this approach is the loss of information by data aggregation and the problems of estimating interactions in the two-way layout without replications. In this article we show how splines can be useful when yearly data, i.e., 1-year age groups and 1-year periods, are given. The estimated spline curves are still smooth and represent yearly changes in the time trends. Further, it is straightforward to include interaction terms by the tensor product of the spline functions. If the data are given in a nonrectangular table, e.g., 5-year age groups and 1-year periods, the period and cohort variables can be parameterized by splines, while the age variable is parameterized as fixed effect levels, which leads to a semiparametric apc model. An important methodological issue in developing the nonparametric and semiparametric models is stability of the estimated spline curve at the boundaries. Here cubic regression splines will be used, which are constrained to be linear in the tails. Another point of importance is the nonidentifiability problem due to the linear dependency of the three time variables. This will be handled by decomposing the basis of each spline by orthogonal projection into constant, linear, and nonlinear terms, as suggested by Holford (1983, Biometrics 39, 311-324) for the traditional apc model. The advantage of using splines for yearly data compared to the traditional approach for aggregated data is the more accurate curve estimation for the nonlinear trend changes and the simple way of modeling interactions between the time variables. The method will be demonstrated with hypothetical data as well as with cancer mortality data.
Sensitivity Analysis of the Integrated Medical Model for ISS Programs
NASA Technical Reports Server (NTRS)
Goodenow, D. A.; Myers, J. G.; Arellano, J.; Boley, L.; Garcia, Y.; Saile, L.; Walton, M.; Kerstman, E.; Reyes, D.; Young, M.
2016-01-01
Sensitivity analysis estimates the relative contribution of the uncertainty in input values to the uncertainty of model outputs. Partial Rank Correlation Coefficient (PRCC) and Standardized Rank Regression Coefficient (SRRC) are methods of conducting sensitivity analysis on nonlinear simulation models like the Integrated Medical Model (IMM). The PRCC method estimates the sensitivity using partial correlation of the ranks of the generated input values to each generated output value. The partial part is so named because adjustments are made for the linear effects of all the other input values in the calculation of correlation between a particular input and each output. In SRRC, standardized regression-based coefficients measure the sensitivity of each input, adjusted for all the other inputs, on each output. Because the relative ranking of each of the inputs and outputs is used, as opposed to the values themselves, both methods accommodate the nonlinear relationship of the underlying model. As part of the IMM v4.0 validation study, simulations are available that predict 33 person-missions on ISS and 111 person-missions on STS. These simulated data predictions feed the sensitivity analysis procedures. The inputs to the sensitivity procedures include the number occurrences of each of the one hundred IMM medical conditions generated over the simulations and the associated IMM outputs: total quality time lost (QTL), number of evacuations (EVAC), and number of loss of crew lives (LOCL). The IMM team will report the results of using PRCC and SRRC on IMM v4.0 predictions of the ISS and STS missions created as part of the external validation study. Tornado plots will assist in the visualization of the condition-related input sensitivities to each of the main outcomes. The outcomes of this sensitivity analysis will drive review focus by identifying conditions where changes in uncertainty could drive changes in overall model output uncertainty. These efforts are an integral part of the overall verification, validation, and credibility review of IMM v4.0.
Induction of chromosomal aberrations at fluences of less than one HZE particle per cell nucleus.
Hada, Megumi; Chappell, Lori J; Wang, Minli; George, Kerry A; Cucinotta, Francis A
2014-10-01
The assumption of a linear dose response used to describe the biological effects of high-LET radiation is fundamental in radiation protection methodologies. We investigated the dose response for chromosomal aberrations for exposures corresponding to less than one particle traversal per cell nucleus by high-energy charged (HZE) nuclei. Human fibroblast and lymphocyte cells were irradiated with several low doses of <0.1 Gy, and several higher doses of up to 1 Gy with oxygen (77 keV/μm), silicon (99 keV/μm) or Fe (175 keV/μm), Fe (195 keV/μm) or Fe (240 keV/μm) particles. Chromosomal aberrations at first mitosis were scored using fluorescence in situ hybridization (FISH) with chromosome specific paints for chromosomes 1, 2 and 4 and DAPI staining of background chromosomes. Nonlinear regression models were used to evaluate possible linear and nonlinear dose-response models based on these data. Dose responses for simple exchanges for human fibroblasts irradiated under confluent culture conditions were best fit by nonlinear models motivated by a nontargeted effect (NTE). The best fits for dose response data for human lymphocytes irradiated in blood tubes were a linear response model for all particles. Our results suggest that simple exchanges in normal human fibroblasts have an important NTE contribution at low-particle fluence. The current and prior experimental studies provide important evidence against the linear dose response assumption used in radiation protection for HZE particles and other high-LET radiation at the relevant range of low doses.
An empirical model for dissolution profile and its application to floating dosage forms.
Weiss, Michael; Kriangkrai, Worawut; Sungthongjeen, Srisagul
2014-06-02
A sum of two inverse Gaussian functions is proposed as a highly flexible empirical model for fitting of in vitro dissolution profiles. The model was applied to quantitatively describe theophylline release from effervescent multi-layer coated floating tablets containing different amounts of the anti-tacking agents talc or glyceryl monostearate. Model parameters were estimated by nonlinear regression (mixed-effects modeling). The estimated parameters were used to determine the mean dissolution time, as well as to reconstruct the time course of release rate for each formulation, whereby the fractional release rate can serve as a diagnostic tool for classification of dissolution processes. The approach allows quantification of dissolution behavior and could provide additional insights into the underlying processes. Copyright © 2014 Elsevier B.V. All rights reserved.
Buscot, Marie-Jeanne; Wotherspoon, Simon S; Magnussen, Costan G; Juonala, Markus; Sabin, Matthew A; Burgner, David P; Lehtimäki, Terho; Viikari, Jorma S A; Hutri-Kähönen, Nina; Raitakari, Olli T; Thomson, Russell J
2017-06-06
Bayesian hierarchical piecewise regression (BHPR) modeling has not been previously formulated to detect and characterise the mechanism of trajectory divergence between groups of participants that have longitudinal responses with distinct developmental phases. These models are useful when participants in a prospective cohort study are grouped according to a distal dichotomous health outcome. Indeed, a refined understanding of how deleterious risk factor profiles develop across the life-course may help inform early-life interventions. Previous techniques to determine between-group differences in risk factors at each age may result in biased estimate of the age at divergence. We demonstrate the use of Bayesian hierarchical piecewise regression (BHPR) to generate a point estimate and credible interval for the age at which trajectories diverge between groups for continuous outcome measures that exhibit non-linear within-person response profiles over time. We illustrate our approach by modeling the divergence in childhood-to-adulthood body mass index (BMI) trajectories between two groups of adults with/without type 2 diabetes mellitus (T2DM) in the Cardiovascular Risk in Young Finns Study (YFS). Using the proposed BHPR approach, we estimated the BMI profiles of participants with T2DM diverged from healthy participants at age 16 years for males (95% credible interval (CI):13.5-18 years) and 21 years for females (95% CI: 19.5-23 years). These data suggest that a critical window for weight management intervention in preventing T2DM might exist before the age when BMI growth rate is naturally expected to decrease. Simulation showed that when using pairwise comparison of least-square means from categorical mixed models, smaller sample sizes tended to conclude a later age of divergence. In contrast, the point estimate of the divergence time is not biased by sample size when using the proposed BHPR method. BHPR is a powerful analytic tool to model long-term non-linear longitudinal outcomes, enabling the identification of the age at which risk factor trajectories diverge between groups of participants. The method is suitable for the analysis of unbalanced longitudinal data, with only a limited number of repeated measures per participants and where the time-related outcome is typically marked by transitional changes or by distinct phases of change over time.
Can air temperatures be used to project influences of climate change on stream temperatures?
NASA Astrophysics Data System (ADS)
Arismendi, I.; Safeeq, M.; Dunham, J.; Johnson, S. L.
2013-12-01
The lack of available in situ stream temperature records at broad spatiotemporal scales have been recognized as a major limiting factor in the understanding of thermal behavior of stream and river systems. This has motivated the promotion of a wide variety of models that use surrogates for stream temperatures including a regression approach that uses air temperature as the predictor variable. We investigate the long-term performance of widely used linear and non-linear regression models between air and stream temperatures to project the latter in future climate scenarios. Specifically, we examine the temporal variability of the parameters that define each of these models in long-term stream and air temperature datasets representing relatively natural and highly human-influenced streams. We selected 25 sites with long-term records that monitored year-round daily measurements of stream temperature (daily mean) in the western United States (California, Oregon, Idaho, Washington, and Alaska). Surface air temperature data from each site was not available. Therefore, we calculated daily mean surface air temperature for each site in contiguous US from a 1/16-degree resolution gridded surface temperature data. Our findings highlight several limitations that are endemic to linear or nonlinear regressions that have been applied in many recent attempts to project future stream temperatures based on air temperature. Our results also show that applications over longer time periods, as well as extrapolation of model predictions to project future stream temperatures are unlikely to be reliable. Although we did not analyze a broad range of stream types at a continental or global extent, our analysis of stream temperatures within the set of streams considered herein was more than sufficient to illustrate a number of specific limitations associated with statistical projections of stream temperature based on air temperature. Radar plots of Nash-Sutcliffe efficiency (NSE) values for the two correlation models in regulated (n=14; lower panel) and unregulated (n=11; upper panel) streams. Solid lines represent average × SD of the NSE estimated for different time periods every 5-year. Dotted line at each plot indicates a NSE = 0.7. Symbols outside of the dotted line at each plot represent a satisfactory level of accuracy of the model
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2015-06-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11-year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (support vector regression, neural networks) besides the multiple linear regression approach. The analysis was applied to several current reanalysis data sets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how these types of data resolve especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the tropical stratosphere were found to be in qualitative agreement with previous attribution studies, although the agreement with observational results was incomplete, especially for JRA-55. The analysis also pointed to the solar signal in the ozone data sets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. The results obtained by linear regression were confirmed by the nonlinear approach through all data sets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. The seasonal evolution of the solar response was also discussed in terms of dynamical causalities in the winter hemispheres. The hypothetical mechanism of a weaker Brewer-Dobson circulation at solar maxima was reviewed together with a discussion of polar vortex behaviour.
Temporal trends in sperm count: a systematic review and meta-regression analysis.
Levine, Hagai; Jørgensen, Niels; Martino-Andrade, Anderson; Mendiola, Jaime; Weksler-Derri, Dan; Mindlis, Irina; Pinotti, Rachel; Swan, Shanna H
2017-11-01
Reported declines in sperm counts remain controversial today and recent trends are unknown. A definitive meta-analysis is critical given the predictive value of sperm count for fertility, morbidity and mortality. To provide a systematic review and meta-regression analysis of recent trends in sperm counts as measured by sperm concentration (SC) and total sperm count (TSC), and their modification by fertility and geographic group. PubMed/MEDLINE and EMBASE were searched for English language studies of human SC published in 1981-2013. Following a predefined protocol 7518 abstracts were screened and 2510 full articles reporting primary data on SC were reviewed. A total of 244 estimates of SC and TSC from 185 studies of 42 935 men who provided semen samples in 1973-2011 were extracted for meta-regression analysis, as well as information on years of sample collection and covariates [fertility group ('Unselected by fertility' versus 'Fertile'), geographic group ('Western', including North America, Europe Australia and New Zealand versus 'Other', including South America, Asia and Africa), age, ejaculation abstinence time, semen collection method, method of measuring SC and semen volume, exclusion criteria and indicators of completeness of covariate data]. The slopes of SC and TSC were estimated as functions of sample collection year using both simple linear regression and weighted meta-regression models and the latter were adjusted for pre-determined covariates and modification by fertility and geographic group. Assumptions were examined using multiple sensitivity analyses and nonlinear models. SC declined significantly between 1973 and 2011 (slope in unadjusted simple regression models -0.70 million/ml/year; 95% CI: -0.72 to -0.69; P < 0.001; slope in adjusted meta-regression models = -0.64; -1.06 to -0.22; P = 0.003). The slopes in the meta-regression model were modified by fertility (P for interaction = 0.064) and geographic group (P for interaction = 0.027). There was a significant decline in SC between 1973 and 2011 among Unselected Western (-1.38; -2.02 to -0.74; P < 0.001) and among Fertile Western (-0.68; -1.31 to -0.05; P = 0.033), while no significant trends were seen among Unselected Other and Fertile Other. Among Unselected Western studies, the mean SC declined, on average, 1.4% per year with an overall decline of 52.4% between 1973 and 2011. Trends for TSC and SC were similar, with a steep decline among Unselected Western (-5.33 million/year, -7.56 to -3.11; P < 0.001), corresponding to an average decline in mean TSC of 1.6% per year and overall decline of 59.3%. Results changed minimally in multiple sensitivity analyses, and there was no statistical support for the use of a nonlinear model. In a model restricted to data post-1995, the slope both for SC and TSC among Unselected Western was similar to that for the entire period (-2.06 million/ml, -3.38 to -0.74; P = 0.004 and -8.12 million, -13.73 to -2.51, P = 0.006, respectively). This comprehensive meta-regression analysis reports a significant decline in sperm counts (as measured by SC and TSC) between 1973 and 2011, driven by a 50-60% decline among men unselected by fertility from North America, Europe, Australia and New Zealand. Because of the significant public health implications of these results, research on the causes of this continuing decline is urgently needed. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Gender-Specific Trends in Educational Attainment and Self-Rated Health, 1972–2002
Hill, Terrence D.; Needham, Belinda L.
2006-01-01
Objectives. We tested whether self-rated health has improved over time (1972–2002) for women and men. We also considered the degree to which historical gains in educational attainment help to explain any observed trends. Methods. Using 21 years of repeated cross-sectional data from the General Social Survey, we estimated a series of ordered logistic regression models predicting self-rated health. Results. Our results show that women’s health status has steadily improved over the 30-year period under study, and these improvements are largely explained by gains in educational attainment. We also found that the health trend for men is nonlinear, suggesting significant fluctuations in health status over time. Conclusions. Based on the linear health status trend and strong mediation pattern for women, and the nonlinear health status trend for men, women have benefited more than men, in terms of self-rated health, from increased educational attainment. PMID:16735623