Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
A simplified competition data analysis for radioligand specific activity determination.
Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A
1990-01-01
Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Regression of non-linear coupling of noise in LIGO detectors
NASA Astrophysics Data System (ADS)
Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.
2018-03-01
In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Kumar, K Vasanth
2006-10-11
Batch kinetic experiments were carried out for the sorption of methylene blue onto activated carbon. The experimental kinetics were fitted to the pseudo first-order and pseudo second-order kinetics by linear and a non-linear method. The five different types of Ho pseudo second-order expression have been discussed. A comparison of linear least-squares method and a trial and error non-linear method of estimating the pseudo second-order rate kinetic parameters were examined. The sorption process was found to follow a both pseudo first-order kinetic and pseudo second-order kinetic model. Present investigation showed that it is inappropriate to use a type 1 and type pseudo second-order expressions as proposed by Ho and Blanachard et al. respectively for predicting the kinetic rate constants and the initial sorption rate for the studied system. Three correct possible alternate linear expressions (type 2 to type 4) to better predict the initial sorption rate and kinetic rate constants for the studied system (methylene blue/activated carbon) was proposed. Linear method was found to check only the hypothesis instead of verifying the kinetic model. Non-linear regression method was found to be the more appropriate method to determine the rate kinetic parameters.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
NASA Astrophysics Data System (ADS)
See, J. J.; Jamaian, S. S.; Salleh, R. M.; Nor, M. E.; Aman, F.
2018-04-01
This research aims to estimate the parameters of Monod model of microalgae Botryococcus Braunii sp growth by the Least-Squares method. Monod equation is a non-linear equation which can be transformed into a linear equation form and it is solved by implementing the Least-Squares linear regression method. Meanwhile, Gauss-Newton method is an alternative method to solve the non-linear Least-Squares problem with the aim to obtain the parameters value of Monod model by minimizing the sum of square error ( SSE). As the result, the parameters of the Monod model for microalgae Botryococcus Braunii sp can be estimated by the Least-Squares method. However, the estimated parameters value obtained by the non-linear Least-Squares method are more accurate compared to the linear Least-Squares method since the SSE of the non-linear Least-Squares method is less than the linear Least-Squares method.
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
NASA Astrophysics Data System (ADS)
Yadav, Manish; Singh, Nitin Kumar
2017-12-01
A comparison of the linear and non-linear regression method in selecting the optimum isotherm among three most commonly used adsorption isotherms (Langmuir, Freundlich, and Redlich-Peterson) was made to the experimental data of fluoride (F) sorption onto Bio-F at a solution temperature of 30 ± 1 °C. The coefficient of correlation (r2) was used to select the best theoretical isotherm among the investigated ones. A total of four Langmuir linear equations were discussed and out of which linear form of most popular Langmuir-1 and Langmuir-2 showed the higher coefficient of determination (0.976 and 0.989) as compared to other Langmuir linear equations. Freundlich and Redlich-Peterson isotherms showed a better fit to the experimental data in linear least-square method, while in non-linear method Redlich-Peterson isotherm equations showed the best fit to the tested data set. The present study showed that the non-linear method could be a better way to obtain the isotherm parameters and represent the most suitable isotherm. Redlich-Peterson isotherm was found to be the best representative (r2 = 0.999) for this sorption system. It is also observed that the values of β are not close to unity, which means the isotherms are approaching the Freundlich but not the Langmuir isotherm.
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Brown, A M
2001-06-01
The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Carbon dioxide stripping in aquaculture -- part III: model verification
Colt, John; Watten, Barnaby; Pfeiffer, Tim
2012-01-01
Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.
Ladstätter, Felix; Garrosa, Eva; Moreno-Jiménez, Bernardo; Ponsoda, Vicente; Reales Aviles, José Manuel; Dai, Junming
2016-01-01
Artificial neural networks are sophisticated modelling and prediction tools capable of extracting complex, non-linear relationships between predictor (input) and predicted (output) variables. This study explores this capacity by modelling non-linearities in the hardiness-modulated burnout process with a neural network. Specifically, two multi-layer feed-forward artificial neural networks are concatenated in an attempt to model the composite non-linear burnout process. Sensitivity analysis, a Monte Carlo-based global simulation technique, is then utilised to examine the first-order effects of the predictor variables on the burnout sub-dimensions and consequences. Results show that (1) this concatenated artificial neural network approach is feasible to model the burnout process, (2) sensitivity analysis is a prolific method to study the relative importance of predictor variables and (3) the relationships among variables involved in the development of burnout and its consequences are to different degrees non-linear. Many relationships among variables (e.g., stressors and strains) are not linear, yet researchers use linear methods such as Pearson correlation or linear regression to analyse these relationships. Artificial neural network analysis is an innovative method to analyse non-linear relationships and in combination with sensitivity analysis superior to linear methods.
Equilibrium, kinetics and process design of acid yellow 132 adsorption onto red pine sawdust.
Can, Mustafa
2015-01-01
Linear and non-linear regression procedures have been applied to the Langmuir, Freundlich, Tempkin, Dubinin-Radushkevich, and Redlich-Peterson isotherms for adsorption of acid yellow 132 (AY132) dye onto red pine (Pinus resinosa) sawdust. The effects of parameters such as particle size, stirring rate, contact time, dye concentration, adsorption dose, pH, and temperature were investigated, and interaction was characterized by Fourier transform infrared spectroscopy and field emission scanning electron microscope. The non-linear method of the Langmuir isotherm equation was found to be the best fitting model to the equilibrium data. The maximum monolayer adsorption capacity was found as 79.5 mg/g. The calculated thermodynamic results suggested that AY132 adsorption onto red pine sawdust was an exothermic, physisorption, and spontaneous process. Kinetics was analyzed by four different kinetic equations using non-linear regression analysis. The pseudo-second-order equation provides the best fit with experimental data.
NASA Astrophysics Data System (ADS)
Li, T.; Griffiths, W. D.; Chen, J.
2017-11-01
The Maximum Likelihood method and the Linear Least Squares (LLS) method have been widely used to estimate Weibull parameters for reliability of brittle and metal materials. In the last 30 years, many researchers focused on the bias of Weibull modulus estimation, and some improvements have been achieved, especially in the case of the LLS method. However, there is a shortcoming in these methods for a specific type of data, where the lower tail deviates dramatically from the well-known linear fit in a classic LLS Weibull analysis. This deviation can be commonly found from the measured properties of materials, and previous applications of the LLS method on this kind of dataset present an unreliable linear regression. This deviation was previously thought to be due to physical flaws ( i.e., defects) contained in materials. However, this paper demonstrates that this deviation can also be caused by the linear transformation of the Weibull function, occurring in the traditional LLS method. Accordingly, it may not be appropriate to carry out a Weibull analysis according to the linearized Weibull function, and the Non-linear Least Squares method (Non-LS) is instead recommended for the Weibull modulus estimation of casting properties.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Brown, Angus M
2006-04-01
The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija
2018-01-01
The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Godin, Bruno; Mayer, Frédéric; Agneessens, Richard; Gerin, Patrick; Dardenne, Pierre; Delfosse, Philippe; Delcarte, Jérôme
2015-01-01
The reliability of different models to predict the biochemical methane potential (BMP) of various plant biomasses using a multispecies dataset was compared. The most reliable prediction models of the BMP were those based on the near infrared (NIR) spectrum compared to those based on the chemical composition. The NIR predictions of local (specific regression and non-linear) models were able to estimate quantitatively, rapidly, cheaply and easily the BMP. Such a model could be further used for biomethanation plant management and optimization. The predictions of non-linear models were more reliable compared to those of linear models. The presentation form (green-dried, silage-dried and silage-wet form) of biomasses to the NIR spectrometer did not influence the performances of the NIR prediction models. The accuracy of the BMP method should be improved to enhance further the BMP prediction models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Geographical variation of cerebrovascular disease in New York State: the correlation with income
Han, Daikwon; Carrow, Shannon S; Rogerson, Peter A; Munschauer, Frederick E
2005-01-01
Background Income is known to be associated with cerebrovascular disease; however, little is known about the more detailed relationship between cerebrovascular disease and income. We examined the hypothesis that the geographical distribution of cerebrovascular disease in New York State may be predicted by a nonlinear model using income as a surrogate socioeconomic risk factor. Results We used spatial clustering methods to identify areas with high and low prevalence of cerebrovascular disease at the ZIP code level after smoothing rates and correcting for edge effects; geographic locations of high and low clusters of cerebrovascular disease in New York State were identified with and without income adjustment. To examine effects of income, we calculated the excess number of cases using a non-linear regression with cerebrovascular disease rates taken as the dependent variable and income and income squared taken as independent variables. The resulting regression equation was: excess rate = 32.075 - 1.22*10-4(income) + 8.068*10-10(income2), and both income and income squared variables were significant at the 0.01 level. When income was included as a covariate in the non-linear regression, the number and size of clusters of high cerebrovascular disease prevalence decreased. Some 87 ZIP codes exceeded the critical value of the local statistic yielding a relative risk of 1.2. The majority of low cerebrovascular disease prevalence geographic clusters disappeared when the non-linear income effect was included. For linear regression, the excess rate of cerebrovascular disease falls with income; each $10,000 increase in median income of each ZIP code resulted in an average reduction of 3.83 observed cases. The significant nonlinear effect indicates a lessening of this income effect with increasing income. Conclusion Income is a non-linear predictor of excess cerebrovascular disease rates, with both low and high observed cerebrovascular disease rate areas associated with higher income. Income alone explains a significant amount of the geographical variance in cerebrovascular disease across New York State since both high and low clusters of cerebrovascular disease dissipate or disappear with income adjustment. Geographical modeling, including non-linear effects of income, may allow for better identification of other non-traditional risk factors. PMID:16242043
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Genomic prediction based on data from three layer lines using non-linear regression models.
Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L
2014-11-06
Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.
Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T
2017-07-01
Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
NASA Astrophysics Data System (ADS)
Chen, Pengfei; Jing, Qi
2017-02-01
An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Non-stationary hydrologic frequency analysis using B-spline quantile regression
NASA Astrophysics Data System (ADS)
Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.
2017-11-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.
Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R
2003-11-22
Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Southworth, Frank; Garrow, Dr. Laurie
This chapter describes the principal types of both passenger and freight demand models in use today, providing a brief history of model development supported by references to a number of popular texts on the subject, and directing the reader to papers covering some of the more recent technical developments in the area. Over the past half century a variety of methods have been used to estimate and forecast travel demands, drawing concepts from economic/utility maximization theory, transportation system optimization and spatial interaction theory, using and often combining solution techniques as varied as Box-Jenkins methods, non-linear multivariate regression, non-linear mathematical programming,more » and agent-based microsimulation.« less
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Confidence limits for data mining models of options prices
NASA Astrophysics Data System (ADS)
Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.
2004-12-01
Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).
Relationship between age and elite marathon race time in world single age records from 5 to 93 years
2014-01-01
Background The aims of the study were (i) to investigate the relationship between elite marathon race times and age in 1-year intervals by using the world single age records in marathon running from 5 to 93 years and (ii) to evaluate the sex difference in elite marathon running performance with advancing age. Methods World single age records in marathon running in 1-year intervals for women and men were analysed regarding changes across age for both men and women using linear and non-linear regression analyses for each age for women and men. Results The relationship between elite marathon race time and age was non-linear (i.e. polynomial regression 4th degree) for women and men. The curve was U-shaped where performance improved from 5 to ~20 years. From 5 years to ~15 years, boys and girls performed very similar. Between ~20 and ~35 years, performance was quite linear, but started to decrease at the age of ~35 years in a curvilinear manner with increasing age in both women and men. The sex difference increased non-linearly (i.e. polynomial regression 7th degree) from 5 to ~20 years, remained unchanged at ~20 min from ~20 to ~50 years and increased thereafter. The sex difference was lowest (7.5%, 10.5 min) at the age of 49 years. Conclusion Elite marathon race times improved from 5 to ~20 years, remained linear between ~20 and ~35 years, and started to increase at the age of ~35 years in a curvilinear manner with increasing age in both women and men. The sex difference in elite marathon race time increased non-linearly and was lowest at the age of ~49 years. PMID:25120915
Predicting the dissolution kinetics of silicate glasses using machine learning
NASA Astrophysics Data System (ADS)
Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu
2018-05-01
Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
Rodriguez-Sabate, Clara; Morales, Ingrid; Sanchez, Alberto; Rodriguez, Manuel
2017-01-01
The complexity of basal ganglia (BG) interactions is often condensed into simple models mainly based on animal data and that present BG in closed-loop cortico-subcortical circuits of excitatory/inhibitory pathways which analyze the incoming cortical data and return the processed information to the cortex. This study was aimed at identifying functional relationships in the BG motor-loop of 24 healthy-subjects who provided written, informed consent and whose BOLD-activity was recorded by MRI methods. The analysis of the functional interaction between these centers by correlation techniques and multiple linear regression showed non-linear relationships which cannot be suitably addressed with these methods. The multiple correspondence analysis (MCA), an unsupervised multivariable procedure which can identify non-linear interactions, was used to study the functional connectivity of BG when subjects were at rest. Linear methods showed different functional interactions expected according to current BG models. MCA showed additional functional interactions which were not evident when using lineal methods. Seven functional configurations of BG were identified with MCA, two involving the primary motor and somatosensory cortex, one involving the deepest BG (external-internal globus pallidum, subthalamic nucleus and substantia nigral), one with the input-output BG centers (putamen and motor thalamus), two linking the input-output centers with other BG (external pallidum and subthalamic nucleus), and one linking the external pallidum and the substantia nigral. The results provide evidence that the non-linear MCA and linear methods are complementary and should be best used in conjunction to more fully understand the nature of functional connectivity of brain centers.
Murad, Havi; Kipnis, Victor; Freedman, Laurence S
2016-10-01
Assessing interactions in linear regression models when covariates have measurement error (ME) is complex.We previously described regression calibration (RC) methods that yield consistent estimators and standard errors for interaction coefficients of normally distributed covariates having classical ME. Here we extend normal based RC (NBRC) and linear RC (LRC) methods to a non-classical ME model, and describe more efficient versions that combine estimates from the main study and internal sub-study. We apply these methods to data from the Observing Protein and Energy Nutrition (OPEN) study. Using simulations we show that (i) for normally distributed covariates efficient NBRC and LRC were nearly unbiased and performed well with sub-study size ≥200; (ii) efficient NBRC had lower MSE than efficient LRC; (iii) the naïve test for a single interaction had type I error probability close to the nominal significance level, whereas efficient NBRC and LRC were slightly anti-conservative but more powerful; (iv) for markedly non-normal covariates, efficient LRC yielded less biased estimators with smaller variance than efficient NBRC. Our simulations suggest that it is preferable to use: (i) efficient NBRC for estimating and testing interaction effects of normally distributed covariates and (ii) efficient LRC for estimating and testing interactions for markedly non-normal covariates. © The Author(s) 2013.
NASA Astrophysics Data System (ADS)
Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti
2016-07-01
In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.
ERIC Educational Resources Information Center
Hovardas, Tasos
2016-01-01
Although ecological systems at varying scales involve non-linear interactions, learners insist thinking in a linear fashion when they deal with ecological phenomena. The overall objective of the present contribution was to propose a hypothetical learning progression for developing non-linear reasoning in prey-predator systems and to provide…
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Non-linear auto-regressive models for cross-frequency coupling in neural time series
Tallot, Lucille; Grabot, Laetitia; Doyère, Valérie; Grenier, Yves; Gramfort, Alexandre
2017-01-01
We address the issue of reliably detecting and quantifying cross-frequency coupling (CFC) in neural time series. Based on non-linear auto-regressive models, the proposed method provides a generative and parametric model of the time-varying spectral content of the signals. As this method models the entire spectrum simultaneously, it avoids the pitfalls related to incorrect filtering or the use of the Hilbert transform on wide-band signals. As the model is probabilistic, it also provides a score of the model “goodness of fit” via the likelihood, enabling easy and legitimate model selection and parameter comparison; this data-driven feature is unique to our model-based approach. Using three datasets obtained with invasive neurophysiological recordings in humans and rodents, we demonstrate that these models are able to replicate previous results obtained with other metrics, but also reveal new insights such as the influence of the amplitude of the slow oscillation. Using simulations, we demonstrate that our parametric method can reveal neural couplings with shorter signals than non-parametric methods. We also show how the likelihood can be used to find optimal filtering parameters, suggesting new properties on the spectrum of the driving signal, but also to estimate the optimal delay between the coupled signals, enabling a directionality estimation in the coupling. PMID:29227989
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Parameterizing sorption isotherms using a hybrid global-local fitting procedure.
Matott, L Shawn; Singh, Anshuman; Rabideau, Alan J
2017-05-01
Predictive modeling of the transport and remediation of groundwater contaminants requires an accurate description of the sorption process, which is usually provided by fitting an isotherm model to site-specific laboratory data. Commonly used calibration procedures, listed in order of increasing sophistication, include: trial-and-error, linearization, non-linear regression, global search, and hybrid global-local search. Given the considerable variability in fitting procedures applied in published isotherm studies, we investigated the importance of algorithm selection through a series of numerical experiments involving 13 previously published sorption datasets. These datasets, considered representative of state-of-the-art for isotherm experiments, had been previously analyzed using trial-and-error, linearization, or non-linear regression methods. The isotherm expressions were re-fit using a 3-stage hybrid global-local search procedure (i.e. global search using particle swarm optimization followed by Powell's derivative free local search method and Gauss-Marquardt-Levenberg non-linear regression). The re-fitted expressions were then compared to previously published fits in terms of the optimized weighted sum of squared residuals (WSSR) fitness function, the final estimated parameters, and the influence on contaminant transport predictions - where easily computed concentration-dependent contaminant retardation factors served as a surrogate measure of likely transport behavior. Results suggest that many of the previously published calibrated isotherm parameter sets were local minima. In some cases, the updated hybrid global-local search yielded order-of-magnitude reductions in the fitness function. In particular, of the candidate isotherms, the Polanyi-type models were most likely to benefit from the use of the hybrid fitting procedure. In some cases, improvements in fitness function were associated with slight (<10%) changes in parameter values, but in other cases significant (>50%) changes in parameter values were noted. Despite these differences, the influence of isotherm misspecification on contaminant transport predictions was quite variable and difficult to predict from inspection of the isotherms. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Goswami, M.; O'Connor, K. M.; Shamseldin, A. Y.
The "Galway Real-Time River Flow Forecasting System" (GFFS) is a software pack- age developed at the Department of Engineering Hydrology, of the National University of Ireland, Galway, Ireland. It is based on a selection of lumped black-box and con- ceptual rainfall-runoff models, all developed in Galway, consisting primarily of both the non-parametric (NP) and parametric (P) forms of two black-box-type rainfall- runoff models, namely, the Simple Linear Model (SLM-NP and SLM-P) and the seasonally-based Linear Perturbation Model (LPM-NP and LPM-P), together with the non-parametric wetness-index-based Linearly Varying Gain Factor Model (LVGFM), the black-box Artificial Neural Network (ANN) Model, and the conceptual Soil Mois- ture Accounting and Routing (SMAR) Model. Comprised of the above suite of mod- els, the system enables the user to calibrate each model individually, initially without updating, and it is capable also of producing combined (i.e. consensus) forecasts us- ing the Simple Average Method (SAM), the Weighted Average Method (WAM), or the Artificial Neural Network Method (NNM). The updating of each model output is achieved using one of four different techniques, namely, simple Auto-Regressive (AR) updating, Linear Transfer Function (LTF) updating, Artificial Neural Network updating (NNU), and updating by the Non-linear Auto-Regressive Exogenous-input method (NARXM). The models exhibit a considerable range of variation in degree of complexity of structure, with corresponding degrees of complication in objective func- tion evaluation. Operating in continuous river-flow simulation and updating modes, these models and techniques have been applied to two Irish catchments, namely, the Fergus and the Brosna. A number of performance evaluation criteria have been used to comparatively assess the model discharge forecast efficiency.
Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy
NASA Astrophysics Data System (ADS)
Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen
2017-05-01
This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.
A non-linear data mining parameter selection algorithm for continuous variables
Razavi, Marianne; Brady, Sean
2017-01-01
In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829
Application of near-infrared spectroscopy in the detection of fat-soluble vitamins in premix feed
NASA Astrophysics Data System (ADS)
Jia, Lian Ping; Tian, Shu Li; Zheng, Xue Cong; Jiao, Peng; Jiang, Xun Peng
2018-02-01
Vitamin is the organic compound and necessary for animal physiological maintenance. The rapid determination of the content of different vitamins in premix feed can help to achieve accurate diets and efficient feeding. Compared with high-performance liquid chromatography and other wet chemical methods, near-infrared spectroscopy is a fast, non-destructive, non-polluting method. 168 samples of premix feed were collected and the contents of vitamin A, vitamin E and vitamin D3 were detected by the standard method. The near-infrared spectra of samples ranging from 10 000 to 4 000 cm-1 were obtained. Partial least squares regression (PLSR) and support vector machine regression (SVMR) were used to construct the quantitative model. The results showed that the RMSEP of PLSR model of vitamin A, vitamin E and vitamin D3 were 0.43×107 IU/kg, 0.09×105 IU/kg and 0.17×107 IU/kg, respectively. The RMSEP of SVMR model was 0.45×107 IU/kg, 0.11×105 IU/kg and 0.18×107 IU/kg. Compared with nonlinear regression method (SVMR), linear regression method (PLSR) is more suitable for the quantitative analysis of vitamins in premix feed.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel
2016-10-01
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.
Han, Hyung Joon; Choi, Sae Byeol; Park, Man Sik; Lee, Jin Suk; Kim, Wan Bae; Song, Tae Jin; Choi, Sang Yong
2011-07-01
Single port laparoscopic surgery has come to the forefront of minimally invasive surgery. For those familiar with conventional techniques, however, this type of operation demands a different type of eye/hand coordination and involves unfamiliar working instruments. Herein, the authors describe the learning curve and the clinical outcomes of single port laparoscopic cholecystectomy for 150 consecutive patients with benign gallbladder disease. All patients underwent single port laparoscopic cholecystectomy using a homemade glove port by one of five operators with different levels of experiences of laparoscopic surgery. The learning curve for each operator was fitted using the non-linear ordinary least squares method based on a non-linear regression model. Mean operating time was 77.6 ± 28.5 min. Fourteen patients (6.0%) were converted to conventional laparoscopic cholecystectomy. Complications occurred in 15 patients (10.0%), as follows: bile duct injury (n = 2), surgical site infection (n = 8), seroma (n = 2), and wound pain (n = 3). One operator achieved a learning curve plateau at 61.4 min per procedure after 8.5 cases and his time improved by 95.3 min as compared with initial operation time. Younger surgeons showed significant decreases in mean operation time and achieved stable mean operation times. In particular, younger surgeons showed significant decreases in operation times after 20 cases. Experienced laparoscopic surgeons can safely perform single port laparoscopic cholecystectomy using conventional or angled laparoscopic instruments. The present study shows that an operator can overcome the single port laparoscopic cholecystectomy learning curve in about eight cases.
Muradian, Kh K; Utko, N O; Mozzhukhina, T H; Pishel', I M; Litoshenko, O Ia; Bezrukov, V V; Fraĭfel'd, V E
2002-01-01
Correlative and regressive relations between the gaseous exchange, thermoregulation and mitochondrial protein content were analyzed by two- and three-dimensional statistics in mice. It has been shown that the pair wise linear methods of analysis did not reveal any significant correlation between the parameters under exploration. However, it became evident at three-dimensional and non-linear plotting for which the coefficients of multivariable correlation reached and even exceeded 0.7-0.8. The calculations based on partial differentiation of the multivariable regression equations allow to conclude that at certain values of VO2, VCO2 and body temperature negative relations between the systems of gaseous exchange and thermoregulation become dominating.
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Self-organising mixture autoregressive model for non-stationary time series modelling.
Ni, He; Yin, Hujun
2008-12-01
Modelling non-stationary time series has been a difficult task for both parametric and nonparametric methods. One promising solution is to combine the flexibility of nonparametric models with the simplicity of parametric models. In this paper, the self-organising mixture autoregressive (SOMAR) network is adopted as a such mixture model. It breaks time series into underlying segments and at the same time fits local linear regressive models to the clusters of segments. In such a way, a global non-stationary time series is represented by a dynamic set of local linear regressive models. Neural gas is used for a more flexible structure of the mixture model. Furthermore, a new similarity measure has been introduced in the self-organising network to better quantify the similarity of time series segments. The network can be used naturally in modelling and forecasting non-stationary time series. Experiments on artificial, benchmark time series (e.g. Mackey-Glass) and real-world data (e.g. numbers of sunspots and Forex rates) are presented and the results show that the proposed SOMAR network is effective and superior to other similar approaches.
Ichikawa, Shintaro; Motosugi, Utaroh; Hernando, Diego; Morisaka, Hiroyuki; Enomoto, Nobuyuki; Matsuda, Masanori; Onishi, Hiroshi
2018-04-10
To compare the abilities of three intravoxel incoherent motion (IVIM) imaging approximation methods to discriminate the histological grade of hepatocellular carcinomas (HCCs). Fifty-eight patients (60 HCCs) underwent IVIM imaging with 11 b-values (0-1000 s/mm 2 ). Slow (D) and fast diffusion coefficients (D * ) and the perfusion fraction (f) were calculated for the HCCs using the mean signal intensities in regions of interest drawn by two radiologists. Three approximation methods were used. First, all three parameters were obtained simultaneously using non-linear fitting (method A). Second, D was obtained using linear fitting (b = 500 and 1000), followed by non-linear fitting for D * and f (method B). Third, D was obtained by linear fitting, f was obtained using the regression line intersection and signals at b = 0, and non-linear fitting was used for D * (method C). A receiver operating characteristic analysis was performed to reveal the abilities of these methods to distinguish poorly-differentiated from well-to-moderately-differentiated HCCs. Inter-reader agreements were assessed using intraclass correlation coefficients (ICCs). The measurements of D, D * , and f in methods B and C (Az-value, 0.658-0.881) had better discrimination abilities than did those in method A (Az-value, 0.527-0.607). The ICCs of D and f were good to excellent (0.639-0.835) with all methods. The ICCs of D * were moderate with methods B (0.580) and C (0.463) and good with method A (0.705). The IVIM parameters may vary depending on the fitting methods, and therefore, further technical refinement may be needed.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio
2015-08-15
In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
A National Study of the Association between Food Environments and County-Level Health Outcomes
ERIC Educational Resources Information Center
Ahern, Melissa; Brown, Cheryl; Dukas, Stephen
2011-01-01
Purpose: This national, county-level study examines the relationship between food availability and access, and health outcomes (mortality, diabetes, and obesity rates) in both metro and non-metro areas. Methods: This is a secondary, cross-sectional analysis using Food Environment Atlas and CDC data. Linear regression models estimate relationships…
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Stochastic search, optimization and regression with energy applications
NASA Astrophysics Data System (ADS)
Hannah, Lauren A.
Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Astronaut mass measurement using linear acceleration method and the effect of body non-rigidity
NASA Astrophysics Data System (ADS)
Yan, Hui; Li, LuMing; Hu, ChunHua; Chen, Hao; Hao, HongWei
2011-04-01
Astronaut's body mass is an essential factor of health monitoring in space. The latest mass measurement device for the International Space Station (ISS) has employed a linear acceleration method. The principle of this method is that the device generates a constant pulling force, and the astronaut is accelerated on a parallelogram motion guide which rotates at a large radius to achieve a nearly linear trajectory. The acceleration is calculated by regression analysis of the displacement versus time trajectory and the body mass is calculated by using the formula m= F/ a. However, in actual flight, the device is instable that the deviation between runs could be 6-7 kg. This paper considers the body non-rigidity as the major cause of error and instability and analyzes the effects of body non-rigidity from different aspects. Body non-rigidity makes the acceleration of the center of mass (C.M.) oscillate and fall behind the point where force is applied. Actual acceleration curves showed that the overall effect of body non-rigidity is an oscillation at about 7 Hz and a deviation of about 25%. To enhance body rigidity, better body restraints were introduced and a prototype based on linear acceleration method was built. Measurement experiment was carried out on ground on an air table. Three human subjects weighing 60-70 kg were measured. The average variance was 0.04 kg and the average measurement error was 0.4%. This study will provide reference for future development of China's own mass measurement device.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Goeyvaerts, Nele; Leuridan, Elke; Faes, Christel; Van Damme, Pierre; Hens, Niel
2015-09-10
Biomedical studies often generate repeated measures of multiple outcomes on a set of subjects. It may be of interest to develop a biologically intuitive model for the joint evolution of these outcomes while assessing inter-subject heterogeneity. Even though it is common for biological processes to entail non-linear relationships, examples of multivariate non-linear mixed models (MNMMs) are still fairly rare. We contribute to this area by jointly analyzing the maternal antibody decay for measles, mumps, rubella, and varicella, allowing for a different non-linear decay model for each infectious disease. We present a general modeling framework to analyze multivariate non-linear longitudinal profiles subject to censoring, by combining multivariate random effects, non-linear growth and Tobit regression. We explore the hypothesis of a common infant-specific mechanism underlying maternal immunity using a pairwise correlated random-effects approach and evaluating different correlation matrix structures. The implied marginal correlation between maternal antibody levels is estimated using simulations. The mean duration of passive immunity was less than 4 months for all diseases with substantial heterogeneity between infants. The maternal antibody levels against rubella and varicella were found to be positively correlated, while little to no correlation could be inferred for the other disease pairs. For some pairs, computational issues occurred with increasing correlation matrix complexity, which underlines the importance of further developing estimation methods for MNMMs. Copyright © 2015 John Wiley & Sons, Ltd.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Data Transformations for Inference with Linear Regression: Clarifications and Recommendations
ERIC Educational Resources Information Center
Pek, Jolynn; Wong, Octavia; Wong, C. M.
2017-01-01
Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…
The arcsine is asinine: the analysis of proportions in ecology.
Warton, David I; Hui, Francis K C
2011-01-01
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Gap-filling methods to impute eddy covariance flux data by preserving variance.
NASA Astrophysics Data System (ADS)
Kunwor, S.; Staudhammer, C. L.; Starr, G.; Loescher, H. W.
2015-12-01
To represent carbon dynamics, in terms of exchange of CO2 between the terrestrial ecosystem and the atmosphere, eddy covariance (EC) data has been collected using eddy flux towers from various sites across globe for more than two decades. However, measurements from EC data are missing for various reasons: precipitation, routine maintenance, or lack of vertical turbulence. In order to have estimates of net ecosystem exchange of carbon dioxide (NEE) with high precision and accuracy, robust gap-filling methods to impute missing data are required. While the methods used so far have provided robust estimates of the mean value of NEE, little attention has been paid to preserving the variance structures embodied by the flux data. Preserving the variance of these data will provide unbiased and precise estimates of NEE over time, which mimic natural fluctuations. We used a non-linear regression approach with moving windows of different lengths (15, 30, and 60-days) to estimate non-linear regression parameters for one year of flux data from a long-leaf pine site at the Joseph Jones Ecological Research Center. We used as our base the Michaelis-Menten and Van't Hoff functions. We assessed the potential physiological drivers of these parameters with linear models using micrometeorological predictors. We then used a parameter prediction approach to refine the non-linear gap-filling equations based on micrometeorological conditions. This provides us an opportunity to incorporate additional variables, such as vapor pressure deficit (VPD) and volumetric water content (VWC) into the equations. Our preliminary results indicate that improvements in gap-filling can be gained with a 30-day moving window with additional micrometeorological predictors (as indicated by lower root mean square error (RMSE) of the predicted values of NEE). Our next steps are to use these parameter predictions from moving windows to gap-fill the data with and without incorporation of potential driver variables of the parameters traditionally used. Then, comparisons of the predicted values from these methods and 'traditional' gap-filling methods (using 12 fixed monthly windows) will be assessed to show the scale of preserving variance. Further, this method will be applied to impute artificially created gaps for analyzing if variance is preserved.
Pomerantsev, Alexey L; Kutsenova, Alla V; Rodionova, Oxana Ye
2017-02-01
A novel non-linear regression method for modeling non-isothermal thermogravimetric data is proposed. Experiments for several heating rates are analyzed simultaneously. The method is applicable to complex multi-stage processes when the number of stages is unknown. Prior knowledge of the type of kinetics is not required. The main idea is a consequent estimation of parameters when the overall model is successively changed from one level of modeling to another. At the first level, the Avrami-Erofeev functions are used. At the second level, the Sestak-Berggren functions are employed with the goal to broaden the overall model. The method is tested using both simulated and real-world data. A comparison of the proposed method with a recently published 'model-free' deconvolution method is presented.
Improved determination of particulate absorption from combined filter pad and PSICAM measurements.
Lefering, Ina; Röttgers, Rüdiger; Weeks, Rebecca; Connor, Derek; Utschig, Christian; Heymann, Kerstin; McKee, David
2016-10-31
Filter pad light absorption measurements are subject to two major sources of experimental uncertainty: the so-called pathlength amplification factor, β, and scattering offsets, o, for which previous null-correction approaches are limited by recent observations of non-zero absorption in the near infrared (NIR). A new filter pad absorption correction method is presented here which uses linear regression against point-source integrating cavity absorption meter (PSICAM) absorption data to simultaneously resolve both β and the scattering offset. The PSICAM has previously been shown to provide accurate absorption data, even in highly scattering waters. Comparisons of PSICAM and filter pad particulate absorption data reveal linear relationships that vary on a sample by sample basis. This regression approach provides significantly improved agreement with PSICAM data (3.2% RMS%E) than previously published filter pad absorption corrections. Results show that direct transmittance (T-method) filter pad absorption measurements perform effectively at the same level as more complex geometrical configurations based on integrating cavity measurements (IS-method and QFT-ICAM) because the linear regression correction compensates for the sensitivity to scattering errors in the T-method. This approach produces accurate filter pad particulate absorption data for wavelengths in the blue/UV and in the NIR where sensitivity issues with PSICAM measurements limit performance. The combination of the filter pad absorption and PSICAM is therefore recommended for generating full spectral, best quality particulate absorption data as it enables correction of multiple errors sources across both measurements.
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
Ultrasensitive surveillance of sensors and processes
Wegerich, Stephan W.; Jarman, Kristin K.; Gross, Kenneth C.
2001-01-01
A method and apparatus for monitoring a source of data for determining an operating state of a working system. The method includes determining a sensor (or source of data) arrangement associated with monitoring the source of data for a system, activating a method for performing a sequential probability ratio test if the data source includes a single data (sensor) source, activating a second method for performing a regression sequential possibility ratio testing procedure if the arrangement includes a pair of sensors (data sources) with signals which are linearly or non-linearly related; activating a third method for performing a bounded angle ratio test procedure if the sensor arrangement includes multiple sensors and utilizing at least one of the first, second and third methods to accumulate sensor signals and determining the operating state of the system.
Ultrasensitive surveillance of sensors and processes
Wegerich, Stephan W.; Jarman, Kristin K.; Gross, Kenneth C.
1999-01-01
A method and apparatus for monitoring a source of data for determining an operating state of a working system. The method includes determining a sensor (or source of data) arrangement associated with monitoring the source of data for a system, activating a method for performing a sequential probability ratio test if the data source includes a single data (sensor) source, activating a second method for performing a regression sequential possibility ratio testing procedure if the arrangement includes a pair of sensors (data sources) with signals which are linearly or non-linearly related; activating a third method for performing a bounded angle ratio test procedure if the sensor arrangement includes multiple sensors and utilizing at least one of the first, second and third methods to accumulate sensor signals and determining the operating state of the system.
Song, Yong-Ze; Yang, Hong-Lei; Peng, Jun-Huan; Song, Yi-Rong; Sun, Qian; Li, Yuan
2015-01-01
Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi’an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5. PMID:26540446
Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco
2017-10-01
The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.
2009-01-01
Background The International Commission on Radiological Protection (ICRP) recommended annual occupational dose limit is 20 mSv. Cancer mortality in Japanese A-bomb survivors exposed to less than 20 mSv external radiation in 1945 was analysed previously, using a latency model with non-linear dose response. Questions were raised regarding statistical inference with this model. Methods Cancers with over 100 deaths in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study are analysed with Poisson regression models incorporating latency, allowing linear and non-linear dose response. Bootstrap percentile and Bias-corrected accelerated (BCa) methods and simulation of the Likelihood Ratio Test lead to Confidence Intervals for Excess Relative Risk (ERR) and tests against the linear model. Results The linear model shows significant large, positive values of ERR for liver and urinary cancers at latencies from 37 - 43 years. Dose response below 20 mSv is strongly non-linear at the optimal latencies for the stomach (11.89 years), liver (36.9), lung (13.6), leukaemia (23.66), and pancreas (11.86) and across broad latency ranges. Confidence Intervals for ERR are comparable using Bootstrap and Likelihood Ratio Test methods and BCa 95% Confidence Intervals are strictly positive across latency ranges for all 5 cancers. Similar risk estimates for 10 mSv (lagged dose) are obtained from the 0 - 20 mSv and 5 - 500 mSv data for the stomach, liver, lung and leukaemia. Dose response for the latter 3 cancers is significantly non-linear in the 5 - 500 mSv range. Conclusion Liver and urinary cancer mortality risk is significantly raised using a latency model with linear dose response. A non-linear model is strongly superior for the stomach, liver, lung, pancreas and leukaemia. Bootstrap and Likelihood-based confidence intervals are broadly comparable and ERR is strictly positive by bootstrap methods for all 5 cancers. Except for the pancreas, similar estimates of latency and risk from 10 mSv are obtained from the 0 - 20 mSv and 5 - 500 mSv subcohorts. Large and significant cancer risks for Japanese survivors exposed to less than 20 mSv external radiation from the atomic bombs in 1945 cast doubt on the ICRP recommended annual occupational dose limit. PMID:20003238
Fast estimation of diffusion tensors under Rician noise by the EM algorithm.
Liu, Jia; Gasbarra, Dario; Railavo, Juha
2016-01-15
Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a fast computational method for maximum likelihood estimation (MLE) of diffusivities under the Rician noise model based on the expectation maximization (EM) algorithm. By using data augmentation, we are able to transform a non-linear regression problem into the generalized linear modeling framework, reducing dramatically the computational cost. The Fisher-scoring method is used for achieving fast convergence of the tensor parameter. The new method is implemented and applied using both synthetic and real data in a wide range of b-amplitudes up to 14,000s/mm(2). Higher accuracy and precision of the Rician estimates are achieved compared with other log-normal based methods. In addition, we extend the maximum likelihood (ML) framework to the maximum a posteriori (MAP) estimation in DTI under the aforementioned scheme by specifying the priors. We will describe how close numerically are the estimators of model parameters obtained through MLE and MAP estimation. Copyright © 2015 Elsevier B.V. All rights reserved.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Møller, Anne; Reventlow, Susanne; Hansen, Åse Marie; Andersen, Lars L; Siersma, Volkert; Lund, Rikke; Avlund, Kirsten; Andersen, Johan Hviid; Mortensen, Ole Steen
2015-01-01
Objectives Our aim was to study associations between physical exposures throughout working life and physical function measured as chair-rise performance in midlife. Methods The Copenhagen Aging and Midlife Biobank (CAMB) provided data about employment and measures of physical function. Individual job histories were assigned exposures from a job exposure matrix. Exposures were standardised to ton-years (lifting 1000 kg each day in 1 year), stand-years (standing/walking for 6 h each day in 1 year) and kneel-years (kneeling for 1 h each day in 1 year). The associations between exposure-years and chair-rise performance (number of chair-rises in 30 s) were analysed in multivariate linear and non-linear regression models adjusted for covariates. Results Mean age among the 5095 participants was 59 years in both genders, and, on average, men achieved 21.58 (SD=5.60) and women 20.38 (SD=5.33) chair-rises in 30 s. Physical exposures were associated with poorer chair-rise performance in both men and women, however, only associations between lifting and standing/walking and chair-rise remained statistically significant among men in the final model. Spline regression analyses showed non-linear associations and confirmed the findings. Conclusions Higher physical exposure throughout working life is associated with slightly poorer chair-rise performance. The associations between exposure and outcome were non-linear. PMID:26537502
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-01-01
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO2 emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO2 emissions is significantly higher than those of GDPpc and Es on per capita CO2 emissions. PMID:29236083
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-12-13
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO₂ emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO₂ emissions is significantly higher than those of GDPpc and Es on per capita CO₂ emissions.
Yu, Wenbao; Park, Taesung
2014-01-01
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
A non-linear regression method for CT brain perfusion analysis
NASA Astrophysics Data System (ADS)
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.
Wen, Cheng; Dallimer, Martin; Carver, Steve; Ziv, Guy
2018-05-06
Despite the great potential of mitigating carbon emission, development of wind farms is often opposed by local communities due to the visual impact on landscape. A growing number of studies have applied nonmarket valuation methods like Choice Experiments (CE) to value the visual impact by eliciting respondents' willingness to pay (WTP) or willingness to accept (WTA) for hypothetical wind farms through survey questions. Several meta-analyses have been found in the literature to synthesize results from different valuation studies, but they have various limitations related to the use of the prevailing multivariate meta-regression analysis. In this paper, we propose a new meta-analysis method to establish general functions for the relationships between the estimated WTP or WTA and three wind farm attributes, namely the distance to residential/coastal areas, the number of turbines and turbine height. This method involves establishing WTA or WTP functions for individual studies, fitting the average derivative functions and deriving the general integral functions of WTP or WTA against wind farm attributes. Results indicate that respondents in different studies consistently showed increasing WTP for moving wind farms to greater distances, which can be fitted by non-linear (natural logarithm) functions. However, divergent preferences for the number of turbines and turbine height were found in different studies. We argue that the new analysis method proposed in this paper is an alternative to the mainstream multivariate meta-regression analysis for synthesizing CE studies and the general integral functions of WTP or WTA against wind farm attributes are useful for future spatial modelling and benefit transfer studies. We also suggest that future multivariate meta-analyses should include non-linear components in the regression functions. Copyright © 2018. Published by Elsevier B.V.
Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C
2002-03-01
Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.
Multigrid approaches to non-linear diffusion problems on unstructured meshes
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
The efficiency of three multigrid methods for solving highly non-linear diffusion problems on two-dimensional unstructured meshes is examined. The three multigrid methods differ mainly in the manner in which the nonlinearities of the governing equations are handled. These comprise a non-linear full approximation storage (FAS) multigrid method which is used to solve the non-linear equations directly, a linear multigrid method which is used to solve the linear system arising from a Newton linearization of the non-linear system, and a hybrid scheme which is based on a non-linear FAS multigrid scheme, but employs a linear solver on each level as a smoother. Results indicate that all methods are equally effective at converging the non-linear residual in a given number of grid sweeps, but that the linear solver is more efficient in cpu time due to the lower cost of linear versus non-linear grid sweeps.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
NASA Astrophysics Data System (ADS)
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
ERIC Educational Resources Information Center
Thompson, Russel L.
Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.
Haoliang Yuan; Yuan Yan Tang
2017-04-01
Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu
2016-01-01
Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603
Machine learning approaches to the social determinants of health in the health and retirement study.
Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David
2018-04-01
Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.
Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.
Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong
2017-01-01
This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Locally linear regression for pose-invariant face recognition.
Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen
2007-07-01
The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.
NASA Astrophysics Data System (ADS)
Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom
2015-04-01
Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
Application of artificial neural network to fMRI regression analysis.
Misaki, Masaya; Miyauchi, Satoru
2006-01-15
We used an artificial neural network (ANN) to detect correlations between event sequences and fMRI (functional magnetic resonance imaging) signals. The layered feed-forward neural network, given a series of events as inputs and the fMRI signal as a supervised signal, performed a non-linear regression analysis. This type of ANN is capable of approximating any continuous function, and thus this analysis method can detect any fMRI signals that correlated with corresponding events. Because of the flexible nature of ANNs, fitting to autocorrelation noise is a problem in fMRI analyses. We avoided this problem by using cross-validation and an early stopping procedure. The results showed that the ANN could detect various responses with different time courses. The simulation analysis also indicated an additional advantage of ANN over non-parametric methods in detecting parametrically modulated responses, i.e., it can detect various types of parametric modulations without a priori assumptions. The ANN regression analysis is therefore beneficial for exploratory fMRI analyses in detecting continuous changes in responses modulated by changes in input values.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Neuberger, Ulf; Kickingereder, Philipp; Helluy, Xavier; Fischer, Manuel; Bendszus, Martin; Heiland, Sabine
2017-12-01
Non-invasive detection of 2-hydroxyglutarate (2HG) by magnetic resonance spectroscopy is attractive since it is related to tumor metabolism. Here, we compare the detection accuracy of 2HG in a controlled phantom setting via widely used localized spectroscopy sequences quantified by linear combination of metabolite signals vs. a more complex approach applying a J-difference editing technique at 9.4T. Different phantoms, comprised out of a concentration series of 2HG and overlapping brain metabolites, were measured with an optimized point-resolved-spectroscopy sequence (PRESS) and an in-house developed J-difference editing sequence. The acquired spectra were post-processed with LCModel and a simulated metabolite set (PRESS) or with a quantification formula for J-difference editing. Linear regression analysis demonstrated a high correlation of real 2HG values with those measured with the PRESS method (adjusted R-squared: 0.700, p<0.001) as well as with those measured with the J-difference editing method (adjusted R-squared: 0.908, p<0.001). The regression model with the J-difference editing method however had a significantly higher explanatory value over the regression model with the PRESS method (p<0.0001). Moreover, with J-difference editing 2HG was discernible down to 1mM, whereas with the PRESS method 2HG values were not discernable below 2mM and with higher systematic errors, particularly in phantoms with high concentrations of N-acetyl-asparate (NAA) and glutamate (Glu). In summary, quantification of 2HG with linear combination of metabolite signals shows high systematic errors particularly at low 2HG concentration and high concentration of confounding metabolites such as NAA and Glu. In contrast, J-difference editing offers a more accurate quantification even at low 2HG concentrations, which outweighs the downsides of longer measurement time and more complex postprocessing. Copyright © 2017. Published by Elsevier GmbH.
Hsu, David
2015-09-27
Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less
2013-01-01
Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042
Valeri, Linda; Lin, Xihong; VanderWeele, Tyler J.
2014-01-01
Mediation analysis is a popular approach to examine the extent to which the effect of an exposure on an outcome is through an intermediate variable (mediator) and the extent to which the effect is direct. When the mediator is mis-measured the validity of mediation analysis can be severely undermined. In this paper we first study the bias of classical, non-differential measurement error on a continuous mediator in the estimation of direct and indirect causal effects in generalized linear models when the outcome is either continuous or discrete and exposure-mediator interaction may be present. Our theoretical results as well as a numerical study demonstrate that in the presence of non-linearities the bias of naive estimators for direct and indirect effects that ignore measurement error can take unintuitive directions. We then develop methods to correct for measurement error. Three correction approaches using method of moments, regression calibration and SIMEX are compared. We apply the proposed method to the Massachusetts General Hospital lung cancer study to evaluate the effect of genetic variants mediated through smoking on lung cancer risk. PMID:25220625
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Calculating stage duration statistics in multistage diseases.
Komarova, Natalia L; Thalhauser, Craig J
2011-01-01
Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.
Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide
2009-01-01
The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Ho, Yuh-Shan
2006-01-01
A comparison was made of the linear least-squares method and a trial-and-error non-linear method of the widely used pseudo-second-order kinetic model for the sorption of cadmium onto ground-up tree fern. Four pseudo-second-order kinetic linear equations are discussed. Kinetic parameters obtained from the four kinetic linear equations using the linear method differed but they were the same when using the non-linear method. A type 1 pseudo-second-order linear kinetic model has the highest coefficient of determination. Results show that the non-linear method may be a better way to obtain the desired parameters.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W
2007-11-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.
BIODEGRADATION PROBABILITY PROGRAM (BIODEG)
The Biodegradation Probability Program (BIODEG) calculates the probability that a chemical under aerobic conditions with mixed cultures of microorganisms will biodegrade rapidly or slowly. It uses fragment constants developed using multiple linear and non-linear regressions and d...
Multivariate meta-analysis for non-linear and other multi-parameter associations
Gasparrini, A; Armstrong, B; Kenward, M G
2012-01-01
In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Hsieh, Chung-Bao; Chen, Chung-Jueng; Chen, Teng-Wei; Yu, Jyh-Cherng; Shen, Kuo-Liang; Chang, Tzu-Ming; Liu, Yao-Chi
2004-01-01
AIM: To investigate whether the non-invasive real-time Indocynine green (ICG) clearance is a sensitive index of liver viability in patients before, during, and after liver transplantation. METHODS: Thirteen patients were studied, two before, three during, and eight following liver transplantation, with two patients suffering acute rejection. The conventional invasive ICG clearance test and ICG pulse spectrophotometry non-invasive real-time ICG clearance test were performed simultaneously. Using linear regression analysis we tested the correlation between these two methods. The transplantation condition of these patients and serum total bilirubin (T. Bil), alanine aminotransferase (ALT), and platelet count were also evaluated. RESULTS: The correlation between these two methods was excellent (r2 = 0.977). CONCLUSION: ICG pulse spectrophotometry clearance is a quick, non-invasive, and reliable liver function test in transplantation patients. PMID:15285026
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.
1983-09-01
books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
NASA Astrophysics Data System (ADS)
Bhojawala, V. M.; Vakharia, D. P.
2017-12-01
This investigation provides an accurate prediction of static pull-in voltage for clamped-clamped micro/nano beams based on distributed model. The Euler-Bernoulli beam theory is used adapting geometric non-linearity of beam, internal (residual) stress, van der Waals force, distributed electrostatic force and fringing field effects for deriving governing differential equation. The Galerkin discretisation method is used to make reduced-order model of the governing differential equation. A regime plot is presented in the current work for determining the number of modes required in reduced-order model to obtain completely converged pull-in voltage for micro/nano beams. A closed-form relation is developed based on the relationship obtained from curve fitting of pull-in instability plots and subsequent non-linear regression for the proposed relation. The output of regression analysis provides Chi-square (χ 2) tolerance value equals to 1 × 10-9, adjusted R-square value equals to 0.999 29 and P-value equals to zero, these statistical parameters indicate the convergence of non-linear fit, accuracy of fitted data and significance of the proposed model respectively. The closed-form equation is validated using available data of experimental and numerical results. The relative maximum error of 4.08% in comparison to several available experimental and numerical data proves the reliability of the proposed closed-form equation.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Mirmohseni, A; Abdollahi, H; Rostamizadeh, K
2007-02-28
Net analyte signal (NAS)-based method called HLA/GO was applied for the selectively determination of binary mixture of ethanol and water by quartz crystal nanobalance (QCN) sensor. A full factorial design was applied for the formation of calibration and prediction sets in the concentration ranges 5.5-22.2 microg mL(-1) for ethanol and 7.01-28.07 microg mL(-1) for water. An optimal time range was selected by procedure which was based on the calculation of the net analyte signal regression plot in any considered time window for each test sample. A moving window strategy was used for searching the region with maximum linearity of NAS regression plot (minimum error indicator) and minimum of PRESS value. On the base of obtained results, the differences on the adsorption profiles in the time range between 1 and 600 s were used to determine mixtures of both compounds by HLA/GO method. The calculation of the net analytical signal using HLA/GO method allows determination of several figures of merit like selectivity, sensitivity, analytical sensitivity and limit of detection, for each component. To check the ability of the proposed method in the selection of linear regions of adsorption profile, a test for detecting non-linear regions of adsorption profile data in the presence of methanol was also described. The results showed that the method was successfully applied for the determination of ethanol and water.
Koper, Olga Martyna; Kamińska, Joanna; Milewska, Anna; Sawicki, Karol; Mariak, Zenon; Kemona, Halina; Matowicka-Karna, Joanna
2018-05-18
The influence of isoform A of reticulon-4 (Nogo-A), also known as neurite outgrowth inhibitor, on primary brain tumor development was reported. Therefore the aim was the evaluation of Nogo-A concentrations in cerebrospinal fluid (CSF) and serum of brain tumor patients compared with non-tumoral individuals. All serum results, except for two cases, obtained both in brain tumors and non-tumoral individuals, were below the lower limit of ELISA detection. Cerebrospinal fluid Nogo-A concentrations were significantly lower in primary brain tumor patients compared to non-tumoral individuals. The univariate linear regression analysis found that if white blood cell count increases by 1 × 10 3 /μL, the mean cerebrospinal fluid Nogo-A concentration value decreases 1.12 times. In the model of multiple linear regression analysis predictor variables influencing cerebrospinal fluid Nogo-A concentrations included: diagnosis, sex, and sodium level. The mean cerebrospinal fluid Nogo-A concentration value was 1.9 times higher for women in comparison to men. In the astrocytic brain tumor group higher sodium level occurs with lower cerebrospinal fluid Nogo-A concentrations. We found the opposite situation in non-tumoral individuals. Univariate linear regression analysis revealed, that cerebrospinal fluid Nogo-A concentrations change in relation to white blood cell count. In the created model of multiple linear regression analysis we found, that within predictor variables influencing CSF Nogo-A concentrations were diagnosis, sex, and sodium level. Results may be relevant to the search for cerebrospinal fluid biomarkers and potential therapeutic targets in primary brain tumor patients. Nogo-A concentrations were tested by means of enzyme-linked immunosorbent assay (ELISA).
NASA Astrophysics Data System (ADS)
Manzoori, Jamshid L.; Amjadi, Mohammad
2003-03-01
The characteristics of host-guest complexation between β-cyclodextrin (β-CD) and two forms of ibuprofen (protonated and deprotonated) were investigated by fluorescence spectrometry. 1:1 stoichiometries for both complexes were established and their association constants at different temperatures were calculated by applying a non-linear regression method to the change in the fluorescence of ibuprofen that brought about by the presence of β-CD. The thermodynamic parameters (Δ H, Δ S and Δ G) associated with the inclusion process were also determined. Based on the obtained results, a sensitive spectrofluorimetric method for the determination of ibuprofen was developed with a linear range of 0.1-2 μg ml -1 and a detection limit of 0.03 μg ml -1. The method was applied satisfactorily to the determination of ibuprofen in pharmaceutical preparations.
Determining association constants from titration experiments in supramolecular chemistry.
Thordarson, Pall
2011-03-01
The most common approach for quantifying interactions in supramolecular chemistry is a titration of the guest to solution of the host, noting the changes in some physical property through NMR, UV-Vis, fluorescence or other techniques. Despite the apparent simplicity of this approach, there are several issues that need to be carefully addressed to ensure that the final results are reliable. This includes the use of non-linear rather than linear regression methods, careful choice of stoichiometric binding model, the choice of method (e.g., NMR vs. UV-Vis) and concentration of host, the application of advanced data analysis methods such as global analysis and finally the estimation of uncertainties and confidence intervals for the results obtained. This tutorial review will give a systematic overview of all these issues-highlighting some of the key messages herein with simulated data analysis examples.
Prediction of optimum sorption isotherm: comparison of linear and non-linear method.
Kumar, K Vasanth; Sivanesan, S
2005-11-11
Equilibrium parameters for Bismarck brown onto rice husk were estimated by linear least square and a trial and error non-linear method using Freundlich, Langmuir and Redlich-Peterson isotherms. A comparison between linear and non-linear method of estimating the isotherm parameters was reported. The best fitting isotherm was Langmuir isotherm and Redlich-Peterson isotherm equation. The results show that non-linear method could be a better way to obtain the parameters. Redlich-Peterson isotherm is a special case of Langmuir isotherm when the Redlich-Peterson isotherm constant g was unity.
The problem of natural funnel asymmetries: a simulation analysis of meta-analysis in macroeconomics.
Callot, Laurent; Paldam, Martin
2011-06-01
Effect sizes in macroeconomic are estimated by regressions on data published by statistical agencies. Funnel plots are a representation of the distribution of the resulting regression coefficients. They are normally much wider than predicted by the t-ratio of the coefficients and often asymmetric. The standard method of meta-analysts in economics assumes that the asymmetries are because of publication bias causing censoring and adjusts the average accordingly. The paper shows that some funnel asymmetries may be 'natural' so that they occur without censoring. We investigate such asymmetries by simulating funnels by pairs of data generating processes (DGPs) and estimating models (EMs), in which the EM has the problem that it disregards a property of the DGP. The problems are data dependency, structural breaks, non-normal residuals, non-linearity, and omitted variables. We show that some of these problems generate funnel asymmetries. When they do, the standard method often fails. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data
Zhao, Xin; Cheung, Leo Wang-Kit
2007-01-01
Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Estimation of suspended-sediment rating curves and mean suspended-sediment loads
Crawford, Charles G.
1991-01-01
A simulation study was done to evaluate: (1) the accuracy and precision of parameter estimates for the bias-corrected, transformed-linear and non-linear models obtained by the method of least squares; (2) the accuracy of mean suspended-sediment loads calculated by the flow-duration, rating-curve method using model parameters obtained by the alternative methods. Parameter estimates obtained by least squares for the bias-corrected, transformed-linear model were considerably more precise than those obtained for the non-linear or weighted non-linear model. The accuracy of parameter estimates obtained for the biascorrected, transformed-linear and weighted non-linear model was similar and was much greater than the accuracy obtained by non-linear least squares. The improved parameter estimates obtained by the biascorrected, transformed-linear or weighted non-linear model yield estimates of mean suspended-sediment load calculated by the flow-duration, rating-curve method that are more accurate and precise than those obtained for the non-linear model.
Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.
Habib, I H I; Hassouna, M E M; Zaki, G A
2005-03-01
Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.
Mao, G D; Adeli, K; Eisenbrey, A B; Artiss, J D
1996-07-01
This evaluation was undertaken to verify the application protocol for the CK-MB assay on the ACCESS Immunoassay Analyzer (Sanofi Diagnostics Pasteur, Chaska, MN). The results show that the ACCESS CK-MB assay total imprecision was 6.8% to 9.1%. Analytical linearity of the ACCESS CK-MB assay was excellent in the range of < 1-214 micrograms/L. A comparison of the ACCESS CK-MB assay with the IMx (Abbott Laboratories, Abbott Park, IL) method shows good correlation r = 0.990 (n = 108). Linear regression analysis yielded Y = 1.36X-0.3, Sx/y = 7.2. ACCESS CK-MB values also correlated well with CK-MB by electrophoresis with r = 0.968 (n = 132). The linear regression equation for this comparison was Y = 1.08X + 1.4, Sx/y = 14.1. The expected non-myocardial infarction range of CK-MB determined by the ACCESS system was 1.3-9.4 micrograms/L (mean = 4.0, n = 58). The ACCESS CK-MB assay would appear to be rapid, precise and clinically useful.
Global estimation of long-term persistence in annual river runoff
NASA Astrophysics Data System (ADS)
Markonis, Y.; Moustakis, Y.; Nasika, C.; Sychova, P.; Dimitriadis, P.; Hanel, M.; Máca, P.; Papalexiou, S. M.
2018-03-01
Long-term persistence (LTP) of annual river runoff is a topic of ongoing hydrological research, due to its implications to water resources management. Here, we estimate its strength, measured by the Hurst coefficient H, in 696 annual, globally distributed, streamflow records with at least 80 years of data. We use three estimation methods (maximum likelihood estimator, Whittle estimator and least squares variance) resulting in similar mean values of H close to 0.65. Subsequently, we explore potential factors influencing H by two linear (Spearman's rank correlation, multiple linear regression) and two non-linear (self-organizing maps, random forests) techniques. Catchment area is found to be crucial for medium to larger watersheds, while climatic controls, such as aridity index, have higher impact to smaller ones. Our findings indicate that long-term persistence is weaker than found in other studies, suggesting that enhanced LTP is encountered in large-catchment rivers, were the effect of spatial aggregation is more intense. However, we also show that the estimated values of H can be reproduced by a short-term persistence stochastic model such as an auto-regressive AR(1) process. A direct consequence is that some of the most common methods for the estimation of H coefficient, might not be suitable for discriminating short- and long-term persistence even in long observational records.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Methodology for the development of normative data for Spanish-speaking pediatric populations.
Rivera, D; Arango-Lasprilla, J C
2017-01-01
To describe the methodology utilized to calculate reliability and the generation of norms for 10 neuropsychological tests for children in Spanish-speaking countries. The study sample consisted of over 4,373 healthy children from nine countries in Latin America (Chile, Cuba, Ecuador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico) and Spain. Inclusion criteria for all countries were to have between 6 to 17 years of age, an Intelligence Quotient of≥80 on the Test of Non-Verbal Intelligence (TONI-2), and score of <19 on the Children's Depression Inventory. Participants completed 10 neuropsychological tests. Reliability and norms were calculated for all tests. Test-retest analysis showed excellent or good- reliability on all tests (r's>0.55; p's<0.001) except M-WCST perseverative errors whose coefficient magnitude was fair. All scores were normed using multiple linear regressions and standard deviations of residual values. Age, age2, sex, and mean level of parental education (MLPE) were included as predictors in the models by country. The non-significant variables (p > 0.05) were removed and the analysis were run again. This is the largest Spanish-speaking children and adolescents normative study in the world. For the generation of normative data, the method based on linear regression models and the standard deviation of residual values was used. This method allows determination of the specific variables that predict test scores, helps identify and control for collinearity of predictive variables, and generates continuous and more reliable norms than those of traditional methods.
Helbich, Marco; Klein, Nadja; Roberts, Hannah; Hagedoorn, Paulien; Groenewegen, Peter P
2018-06-20
Exposure to green space seems to be beneficial for self-reported mental health. In this study we used an objective health indicator, namely antidepressant prescription rates. Current studies rely exclusively upon mean regression models assuming linear associations. It is, however, plausible that the presence of green space is non-linearly related with different quantiles of the outcome antidepressant prescription rates. These restrictions may contribute to inconsistent findings. Our aim was: a) to assess antidepressant prescription rates in relation to green space, and b) to analyze how the relationship varies non-linearly across different quantiles of antidepressant prescription rates. We used cross-sectional data for the year 2014 at a municipality level in the Netherlands. Ecological Bayesian geoadditive quantile regressions were fitted for the 15%, 50%, and 85% quantiles to estimate green space-prescription rate correlations, controlling for physical activity levels, socio-demographics, urbanicity, etc. RESULTS: The results suggested that green space was overall inversely and non-linearly associated with antidepressant prescription rates. More important, the associations differed across the quantiles, although the variation was modest. Significant non-linearities were apparent: The associations were slightly positive in the lower quantile and strongly negative in the upper one. Our findings imply that an increased availability of green space within a municipality may contribute to a reduction in the number of antidepressant prescriptions dispensed. Green space is thus a central health and community asset, whilst a minimum level of 28% needs to be established for health gains. The highest effectiveness occurred at a municipality surface percentage higher than 79%. This inverse dose-dependent relation has important implications for setting future community-level health and planning policies. Copyright © 2018 Elsevier Inc. All rights reserved.
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Khansari, Maziyar M; O’Neill, William; Penn, Richard; Chau, Felix; Blair, Norman P; Shahidi, Mahnaz
2016-01-01
The conjunctiva is a densely vascularized mucus membrane covering the sclera of the eye with a unique advantage of accessibility for direct visualization and non-invasive imaging. The purpose of this study is to apply an automated quantitative method for discrimination of different stages of diabetic retinopathy (DR) using conjunctival microvasculature images. Fine structural analysis of conjunctival microvasculature images was performed by ordinary least square regression and Fisher linear discriminant analysis. Conjunctival images between groups of non-diabetic and diabetic subjects at different stages of DR were discriminated. The automated method’s discriminate rates were higher than those determined by human observers. The method allowed sensitive and rapid discrimination by assessment of conjunctival microvasculature images and can be potentially useful for DR screening and monitoring. PMID:27446692
Tarraf, Wassim; Miranda, Patricia Y.; González, Hector M.
2011-01-01
Objective To examine time trends and differences in medical expenditures between non-citizens, foreign-born, and U.S.-born citizens. Methods We used multi-year Medical Expenditures Panel Survey (2000–2008) data on non-institutionalized adults in the U.S. (N=190,965). Source specific and total medical expenditures were analyzed using regression models, bootstrap prediction techniques, and linear and non-linear decomposition methods to evaluate the relationship between immigration status and expenditures, controlling for confounding effects. Results We found that the average health expenditures between 2000 and 2008 for non-citizens immigrants ($1,836) were substantially lower compared to both foreign-born ($3,737) and U.S.-born citizens ($4,478). Differences were maintained after controlling for confounding effects. Decomposition techniques showed that the main determinants of these differences were the availability of a usual source of healthcare, insurance, and ethnicity/race. Conclusion Lower healthcare expenditures among immigrants result from disparate access to healthcare. The dissipation of demographic advantages among immigrants could prospectively produce higher pressures on the U.S. healthcare system as immigrants age and levels of chronic conditions rise. Barring a shift in policy, the brunt of the effects could be borne by an already overextended public healthcare system. PMID:22222383
Zhang, Haihong; Guan, Cuntai; Ang, Kai Keng; Wang, Chuanchu
2012-01-01
Detecting motor imagery activities versus non-control in brain signals is the basis of self-paced brain-computer interfaces (BCIs), but also poses a considerable challenge to signal processing due to the complex and non-stationary characteristics of motor imagery as well as non-control. This paper presents a self-paced BCI based on a robust learning mechanism that extracts and selects spatio-spectral features for differentiating multiple EEG classes. It also employs a non-linear regression and post-processing technique for predicting the time-series of class labels from the spatio-spectral features. The method was validated in the BCI Competition IV on Dataset I where it produced the lowest prediction error of class labels continuously. This report also presents and discusses analysis of the method using the competition data set. PMID:22347153
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
How is the weather? Forecasting inpatient glycemic control
Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M
2017-01-01
Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman
2011-01-01
This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626
Partitioning sources of variation in vertebrate species richness
Boone, R.B.; Krohn, W.B.
2000-01-01
Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
Ricci, Cristian; Gervasi, Federico; Gaeta, Maddalena; Smuts, Cornelius M; Schutte, Aletta E; Leitzmann, Michael F
2018-05-01
Background Light physical activity is known to reduce atrial fibrillation risk, whereas moderate to vigorous physical activity may result in an increased risk. However, the question of what volume of physical activity can be considered beneficial remains poorly understood. The scope of the present work was to examine the relation between physical activity volume and atrial fibrillation risk. Design A comprehensive systematic review was performed following the PRISMA guidelines. Methods A non-linear meta-regression considering the amount of energy spent in physical activity was carried out. The first derivative of the non-linear relation between physical activity and atrial fibrillation risk was evaluated to determine the volume of physical activity that carried the minimum atrial fibrillation risk. Results The dose-response analysis of the relation between physical activity and atrial fibrillation risk showed that physical activity at volumes of 5-20 metabolic equivalents per week (MET-h/week) was associated with significant reduction in atrial fibrillation risk (relative risk for 19 MET-h/week = 0.92 (0.87, 0.98). By comparison, physical activity volumes exceeding 20 MET-h/week were unrelated to atrial fibrillation risk (relative risk for 21 MET-h/week = 0.95 (0.88, 1.02). Conclusion These data show a J-shaped relation between physical activity volume and atrial fibrillation risk. Physical activity at volumes of up to 20 MET-h/week is associated with reduced atrial fibrillation risk, whereas volumes exceeding 20 MET-h/week show no relation with risk.
2013-01-01
Background Identifying the emotional state is helpful in applications involving patients with autism and other intellectual disabilities; computer-based training, human computer interaction etc. Electrocardiogram (ECG) signals, being an activity of the autonomous nervous system (ANS), reflect the underlying true emotional state of a person. However, the performance of various methods developed so far lacks accuracy, and more robust methods need to be developed to identify the emotional pattern associated with ECG signals. Methods Emotional ECG data was obtained from sixty participants by inducing the six basic emotional states (happiness, sadness, fear, disgust, surprise and neutral) using audio-visual stimuli. The non-linear feature ‘Hurst’ was computed using Rescaled Range Statistics (RRS) and Finite Variance Scaling (FVS) methods. New Hurst features were proposed by combining the existing RRS and FVS methods with Higher Order Statistics (HOS). The features were then classified using four classifiers – Bayesian Classifier, Regression Tree, K- nearest neighbor and Fuzzy K-nearest neighbor. Seventy percent of the features were used for training and thirty percent for testing the algorithm. Results Analysis of Variance (ANOVA) conveyed that Hurst and the proposed features were statistically significant (p < 0.001). Hurst computed using RRS and FVS methods showed similar classification accuracy. The features obtained by combining FVS and HOS performed better with a maximum accuracy of 92.87% and 76.45% for classifying the six emotional states using random and subject independent validation respectively. Conclusions The results indicate that the combination of non-linear analysis and HOS tend to capture the finer emotional changes that can be seen in healthy ECG data. This work can be further fine tuned to develop a real time system. PMID:23680041
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
Predicting musically induced emotions from physiological inputs: linear and neural network models.
Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M
2013-01-01
Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping
This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
Manzoori, Jamshid L; Amjadi, Mohammad
2003-03-15
The characteristics of host-guest complexation between beta-cyclodextrin (beta-CD) and two forms of ibuprofen (protonated and deprotonated) were investigated by fluorescence spectrometry. 1:1 stoichiometries for both complexes were established and their association constants at different temperatures were calculated by applying a non-linear regression method to the change in the fluorescence of ibuprofen that brought about by the presence of beta-CD. The thermodynamic parameters (deltaH, deltaS and deltaG) associated with the inclusion process were also determined. Based on the obtained results, a sensitive spectrofluorimetric method for the determination of ibuprofen was developed with a linear range of 0.1-2 microg ml(-1) and a detection limit of 0.03 microg ml(-1). The method was applied satisfactorily to the determination of ibuprofen in pharmaceutical preparations. Copyright 2002 Elsevier Science B.V.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
NASA Astrophysics Data System (ADS)
Brümmer, C.; Moffat, A. M.; Huth, V.; Augustin, J.; Herbst, M.; Kutsch, W. L.
2016-12-01
Manual carbon dioxide flux measurements with closed chambers at scheduled campaigns are a versatile method to study management effects at small scales in multiple-plot experiments. The eddy covariance technique has the advantage of quasi-continuous measurements but requires large homogeneous areas of a few hectares. To evaluate the uncertainties associated with interpolating from individual campaigns to the whole vegetation period, we installed both techniques at an agricultural site in Northern Germany. The presented comparison covers two cropping seasons, winter oilseed rape in 2012/13 and winter wheat in 2013/14. Modeling half-hourly carbon fluxes from campaigns is commonly performed based on non-linear regressions for the light response and respiration. The daily averages of net CO2 modeled from chamber data deviated from eddy covariance measurements in the range of ± 5 g C m-2 day-1. To understand the observed differences and to disentangle the effects, we performed four additional setups (expert versus default settings of the non-linear regressions based algorithm, purely empirical modeling with artificial neural networks versus non-linear regressions, cross-validating using eddy covariance measurements as campaign fluxes, weekly versus monthly scheduling of campaigns) to model the half-hourly carbon fluxes for the whole vegetation period. The good agreement of the seasonal course of net CO2 at plot and field scale for our agricultural site demonstrates that both techniques are robust and yield consistent results at seasonal time scale even for a managed ecosystem with high temporal dynamics in the fluxes. This allows combining the respective advantages of factorial experiments at plot scale with dense time series data at field scale. Furthermore, the information from the quasi-continuous eddy covariance measurements can be used to derive vegetation proxies to support the interpolation of carbon fluxes in-between the manual chamber campaigns.
Digital controllers for VTOL aircraft
NASA Technical Reports Server (NTRS)
Stengel, R. F.; Broussard, J. R.; Berry, P. W.
1976-01-01
Using linear-optimal estimation and control techniques, digital-adaptive control laws have been designed for a tandem-rotor helicopter which is equipped for fully automatic flight in terminal area operations. Two distinct discrete-time control laws are designed to interface with velocity-command and attitude-command guidance logic, and each incorporates proportional-integral compensation for non-zero-set-point regulation, as well as reduced-order Kalman filters for sensor blending and noise rejection. Adaptation to flight condition is achieved with a novel gain-scheduling method based on correlation and regression analysis. The linear-optimal design approach is found to be a valuable tool in the development of practical multivariable control laws for vehicles which evidence significant coupling and insufficient natural stability.
Quantum State Tomography via Linear Regression Estimation
Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan
2013-01-01
A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
Kumar, K Vasanth
2006-08-21
The experimental equilibrium data of malachite green onto activated carbon were fitted to the Freundlich, Langmuir and Redlich-Peterson isotherms by linear and non-linear method. A comparison between linear and non-linear of estimating the isotherm parameters was discussed. The four different linearized form of Langmuir isotherm were also discussed. The results confirmed that the non-linear method as a better way to obtain isotherm parameters. The best fitting isotherm was Langmuir and Redlich-Peterson isotherm. Redlich-Peterson is a special case of Langmuir when the Redlich-Peterson isotherm constant g was unity.
A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier.
Chen, Yi; Shao, Ying; Yan, Jie; Yuan, Ti-Fei; Qu, Yanwen; Lee, Elizabeth; Wang, Shuihua
2017-01-01
Alzheimer's disease patients are increasing rapidly every year. Scholars tend to use computer vision methods to develop automatic diagnosis system. (Background) In 2015, Gorji et al. proposed a novel method using pseudo Zernike moment. They tested four classifiers: learning vector quantization neural network, pattern recognition neural network trained by Levenberg-Marquardt, by resilient backpropagation, and by scaled conjugate gradient. This study presents an improved method by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Therefore, it can be used to detect Alzheimer's disease. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Churpek, Matthew M; Yuen, Trevor C; Winslow, Christopher; Meltzer, David O; Kattan, Michael W; Edelson, Dana P
2016-02-01
Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database. Observational cohort study. Five hospitals, from November 2008 until January 2013. Hospitalized ward patients None Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]). In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.
NASA Astrophysics Data System (ADS)
Wu, Di; He, Yong
2007-11-01
The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Echocardiographic measurements of left ventricular mass by a non-geometric method
NASA Technical Reports Server (NTRS)
Parra, Beatriz; Buckey, Jay; Degraff, David; Gaffney, F. Andrew; Blomqvist, C. Gunnar
1987-01-01
The accuracy of a new nongeometric method for calculating left ventricular myocardial volumes from two-dimensional echocardiographic images was assessed in vitro using 20 formalin-fixed normal human hearts. Serial oblique short-axis images were acquired from one point at 5-deg intervals, for a total of 10-12 cross sections. Echocardiographic myocardial volumes were calculated as the difference between the volumes defined by the epi- and endocardial surfaces. Actual myocardial volumes were determined by water displacement. Volumes ranged from 80 to 174 ml (mean 130.8 ml). Linear regression analysis demonstrated excellent agreement between the echocardiographic and direct measurements.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Health Care Utilization and Expenditures in Persons Receiving Social Assistance in 2012
Reich, Oliver; Wolffers, Felix; Signorell, Andri; Blozik, Eva
2015-01-01
Introduction: Lower socioeconomic position and measures of social and material deprivation are associated with morbidity and mortality. These inequalities in health among groups of various statuses remain one of the main challenges for public health. The aim of the study was to investigate differences in health care use and costs between recipients of social assistance and non-recipients aged 65 years and younger within the Swiss healthcare system. Methods: We analyzed claims data of 13 492 individuals living in Bern, Switzerland of which 391 received social assistance. For the year 2012, we compared the number of physician visits, hospitalizations, prescribed drugs, and total health care costs as covered by mandatory health insurance. Linear and logistic adjusted regression analyses were made to estimate the effect of receipt of social assistance on health service use and costs. Results: Multivariate linear regression analysis revealed that health care costs increased on average by 1 666 CHF if individuals received social assistance. Recipients of social assistance had on average 1.2 more ambulatory consultations than non-recipients and got 1.65 more different medications prescribed as compared to non-recipients. The chance for recipients of social assistance to be hospitalized was almost twice that of non-recipients (Odds Ratio 1.96, 95% confidence interval 1.49-2.59). Conclusions: Recipients of social assistance demonstrate an exceedingly high use of health services. The need for interventions to alleviate the identified inequalities in health and health care needs is obvious. PMID:25946912
Body Mass Index: Accounting for Full Time Sedentary Occupation and 24-Hr Self-Reported Time Use
Tudor-Locke, Catrine; Schuna, John M.; Katzmarzyk, Peter T.; Liu, Wei; Hamrick, Karen S.; Johnson, William D.
2014-01-01
Objectives We used linked existing data from the 2006–2008 American Time Use Survey (ATUS), the Current Population Survey (CPS, a federal survey that provides on-going U.S. vital statistics, including employment rates) and self-reported body mass index (BMI) to answer: How does BMI vary across full time occupations dichotomized as sedentary/non-sedentary, accounting for time spent in sleep, other sedentary behaviors, and light, moderate, and vigorous intensity activities? Methods We classified time spent engaged at a primary job (sedentary or non-sedentary), sleep, and other non-work, non-sleep intensity-defined behaviors, specifically, sedentary behavior, light, moderate, and vigorous intensity activities. Age groups were defined by 20–29, 30–39, 40–49, and 50–64 years. BMI groups were defined by 18.5–24.9, 25.0–27.4, 27.5–29.9, 30.0–34.9, and ≥35.0 kg/m2. Logistic and linear regression were used to examine the association between BMI and employment in a sedentary occupation, considering time spent in sleep, other non-work time spent in sedentary behaviors, and light, moderate, and vigorous intensity activities, sex, age race/ethnicity, and household income. Results The analysis data set comprised 4,092 non-pregnant, non-underweight individuals 20–64 years of age who also reported working more than 7 hours at their primary jobs on their designated time use reporting day. Logistic and linear regression analyses failed to reveal any associations between BMI and the sedentary/non-sedentary occupation dichotomy considering time spent in sleep, other non-work time spent in sedentary behaviors, and light, moderate, and vigorous intensity activities, sex, age, race/ethnicity, and household income. Conclusions We found no evidence of a relationship between self-reported full time sedentary occupation classification and BMI after accounting for sex, age, race/ethnicity, and household income and 24-hours of time use including non-work related physical activity and sedentary behaviors. The various sources of error associated with self-report methods and assignment of generalized activity and occupational intensity categories could compound to obscure any real relationships. PMID:25295601
Calibration and Data Analysis of the MC-130 Air Balance
NASA Technical Reports Server (NTRS)
Booth, Dennis; Ulbrich, N.
2012-01-01
Design, calibration, calibration analysis, and intended use of the MC-130 air balance are discussed. The MC-130 balance is an 8.0 inch diameter force balance that has two separate internal air flow systems and one external bellows system. The manual calibration of the balance consisted of a total of 1854 data points with both unpressurized and pressurized air flowing through the balance. A subset of 1160 data points was chosen for the calibration data analysis. The regression analysis of the subset was performed using two fundamentally different analysis approaches. First, the data analysis was performed using a recently developed extension of the Iterative Method. This approach fits gage outputs as a function of both applied balance loads and bellows pressures while still allowing the application of the iteration scheme that is used with the Iterative Method. Then, for comparison, the axial force was also analyzed using the Non-Iterative Method. This alternate approach directly fits loads as a function of measured gage outputs and bellows pressures and does not require a load iteration. The regression models used by both the extended Iterative and Non-Iterative Method were constructed such that they met a set of widely accepted statistical quality requirements. These requirements lead to reliable regression models and prevent overfitting of data because they ensure that no hidden near-linear dependencies between regression model terms exist and that only statistically significant terms are included. Finally, a comparison of the axial force residuals was performed. Overall, axial force estimates obtained from both methods show excellent agreement as the differences of the standard deviation of the axial force residuals are on the order of 0.001 % of the axial force capacity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh
2005-12-16
Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k {sub i} values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, themore » previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.« less
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Nirouei, Mahyar; Ghasemi, Ghasem; Abdolmaleki, Parviz; Tavakoli, Abdolreza; Shariati, Shahab
2012-06-01
The antiviral drugs that inhibit human immunodeficiency virus (HIV) entry to the target cells are already in different phases of clinical trials. They prevent viral entry and have a highly specific mechanism of action with a low toxicity profile. Few QSAR studies have been performed on this group of inhibitors. This study was performed to develop a quantitative structure-activity relationship (QSAR) model of the biological activity of indole glyoxamide derivatives as inhibitors of the interaction between HIV glycoprotein gp120 and host cell CD4 receptors. Forty different indole glyoxamide derivatives were selected as a sample set and geometrically optimized using Gaussian 98W. Different combinations of multiple linear regression (MLR), genetic algorithms (GA) and artificial neural networks (ANN) were then utilized to construct the QSAR models. These models were also utilized to select the most efficient subsets of descriptors in a cross-validation procedure for non-linear log (1/EC50) prediction. The results that were obtained using GA-ANN were compared with MLR-MLR and MLR-ANN models. A high predictive ability was observed for the MLR, MLR-ANN and GA-ANN models, with root mean sum square errors (RMSE) of 0.99, 0.91 and 0.67, respectively (N = 40). In summary, machine learning methods were highly effective in designing QSAR models when compared to statistical method.
Age estimation standards for a Western Australian population using the coronal pulp cavity index.
Karkhanis, Shalmira; Mack, Peter; Franklin, Daniel
2013-09-10
Age estimation is a vital aspect in creating a biological profile and aids investigators by narrowing down potentially matching identities from the available pool. In addition to routine casework, in the present global political scenario, age estimation in living individuals is required in cases of refugees, asylum seekers, human trafficking and to ascertain age of criminal responsibility. Thus robust methods that are simple, non-invasive and ethically viable are required. The aim of the present study is, therefore, to test the reliability and applicability of the coronal pulp cavity index method, for the purpose of developing age estimation standards for an adult Western Australian population. A total of 450 orthopantomograms (220 females and 230 males) of Australian individuals were analyzed. Crown and coronal pulp chamber heights were measured in the mandibular left and right premolars, and the first and second molars. These measurements were then used to calculate the tooth coronal index. Data was analyzed using paired sample t-tests to assess bilateral asymmetry followed by simple linear and multiple regressions to develop age estimation models. The most accurate age estimation based on simple linear regression model was with mandibular right first molar (SEE ±8.271 years). Multiple regression models improved age prediction accuracy considerably and the most accurate model was with bilateral first and second molars (SEE ±6.692 years). This study represents the first investigation of this method in a Western Australian population and our results indicate that the method is suitable for forensic application. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R
2012-01-01
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases.
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
A geometric approach to non-linear correlations with intrinsic scatter
NASA Astrophysics Data System (ADS)
Pihajoki, Pauli
2017-12-01
We propose a new mathematical model for n - k-dimensional non-linear correlations with intrinsic scatter in n-dimensional data. The model is based on Riemannian geometry and is naturally symmetric with respect to the measured variables and invariant under coordinate transformations. We combine the model with a Bayesian approach for estimating the parameters of the correlation relation and the intrinsic scatter. A side benefit of the approach is that censored and truncated data sets and independent, arbitrary measurement errors can be incorporated. We also derive analytic likelihoods for the typical astrophysical use case of linear relations in n-dimensional Euclidean space. We pay particular attention to the case of linear regression in two dimensions and compare our results to existing methods. Finally, we apply our methodology to the well-known MBH-σ correlation between the mass of a supermassive black hole in the centre of a galactic bulge and the corresponding bulge velocity dispersion. The main result of our analysis is that the most likely slope of this correlation is ∼6 for the data sets used, rather than the values in the range of ∼4-5 typically quoted in the literature for these data.
Radar modulation classification using time-frequency representation and nonlinear regression
NASA Astrophysics Data System (ADS)
De Luigi, Christophe; Arques, Pierre-Yves; Lopez, Jean-Marc; Moreau, Eric
1999-09-01
In naval electronic environment, pulses emitted by radars are collected by ESM receivers. For most of them the intrapulse signal is modulated by a particular law. To help the classical identification process, a classification and estimation of this modulation law is applied on the intrapulse signal measurements. To estimate with a good accuracy the time-varying frequency of a signal corrupted by an additive noise, one method has been chosen. This method consists on the Wigner distribution calculation, the instantaneous frequency is then estimated by the peak location of the distribution. Bias and variance of the estimator are performed by computed simulations. In a estimated sequence of frequencies, we assume the presence of false and good estimated ones, the hypothesis of Gaussian distribution is made on the errors. A robust non linear regression method, based on the Levenberg-Marquardt algorithm, is thus applied on these estimated frequencies using a Maximum Likelihood Estimator. The performances of the method are tested by using varied modulation laws and different signal to noise ratios.
Yin, Xinyou; Belay, Daniel W; van der Putten, Peter E L; Struik, Paul C
2014-12-01
Maximum quantum yield for leaf CO2 assimilation under limiting light conditions (Φ CO2LL) is commonly estimated as the slope of the linear regression of net photosynthetic rate against absorbed irradiance over a range of low-irradiance conditions. Methodological errors associated with this estimation have often been attributed either to light absorptance by non-photosynthetic pigments or to some data points being beyond the linear range of the irradiance response, both causing an underestimation of Φ CO2LL. We demonstrate here that a decrease in photosystem (PS) photochemical efficiency with increasing irradiance, even at very low levels, is another source of error that causes a systematic underestimation of Φ CO2LL. A model method accounting for this error was developed, and was used to estimate Φ CO2LL from simultaneous measurements of gas exchange and chlorophyll fluorescence on leaves using various combinations of species, CO2, O2, or leaf temperature levels. The conventional linear regression method under-estimated Φ CO2LL by ca. 10-15%. Differences in the estimated Φ CO2LL among measurement conditions were generally accounted for by different levels of photorespiration as described by the Farquhar-von Caemmerer-Berry model. However, our data revealed that the temperature dependence of PSII photochemical efficiency under low light was an additional factor that should be accounted for in the model.
Fitting program for linear regressions according to Mahon (1996)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trappitsch, Reto G.
2018-01-09
This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.
NASA Astrophysics Data System (ADS)
Lockwood, M.; Owens, M. J.; Barnard, L.; Usoskin, I. G.
2016-11-01
We use sunspot-group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups [RB] above a variable cut-off threshold of observed total whole spot area (uncorrected for foreshortening) to simulate what a lower-acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number [RA] using a variety of regression techniques. It is found that a very high correlation between RA and RB (r_{AB} > 0.98) does not prevent large errors in the intercalibration (for example sunspot-maximum values can be over 30 % too large even for such levels of r_{AB}). In generating the backbone sunspot number [R_{BB}], Svalgaard and Schatten ( Solar Phys., 2016) force regression fits to pass through the scatter-plot origin, which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot-cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile ("Q-Q") plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least-squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot-group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar-cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Wong, William W.; Strizich, Garrett; Heo, Moonseong; Heymsfield, Steven B.; Himes, John H.; Rock, Cheryl L.; Gellman, Marc D.; Siega-Riz, Anna Maria; Sotres-Alvarez, Daniela; Davis, Sonia M.; Arredondo, Elva M.; Van Horn, Linda; Wylie-Rosett, Judith; Sanchez-Johnsen, Lisa; Kaplan, Robert; Mossavar-Rahmani, Yasmin
2016-01-01
Objective To evaluate the percentage of body fat (%BF)-BMI relationship, identify %BF levels corresponding to adult BMI cut-points, and examine %BF-BMI agreement in a diverse Hispanic/Latino population. Methods %BF by bioelectrical impedance analysis (BIA) was corrected against %BF by 18O dilution in 476 participants of the ancillary Hispanic Community Health/Latinos Studies. Corrected %BF were regressed against 1/BMI in the parent study (n=15,261), fitting models for each age group, by sex and Hispanic/Latino background; predicted %BF was then computed for each BMI cut-point. Results BIA underestimated %BF by 8.7 ± 0.3% in women and 4.6 ± 0.3% in men (P < 0.0001). The %BF-BMI relationshp was non-linear and linear for 1/BMI. Sex- and age-specific regression parameters between %BF and 1/BMI were consistent across Hispanic/Latino backgrounds (P > 0.05). The precision of the %BF-1/BMI association weakened with increasing age in men but not women. The proportion of participants classified as non-obese by BMI but obese by %BF was generally higher among women and older adults (16.4% in women vs. 12.0% in men aged 50-74 y). Conclusions %BF was linearly related to 1/BMI with consistent relationship across Hispanic/Lation backgrounds. BMI cut-points consistently underestimated the proportion of Hispanics/Latinos with excess adiposity. PMID:27184359
Optimized multiple linear mappings for single image super-resolution
NASA Astrophysics Data System (ADS)
Zhang, Kaibing; Li, Jie; Xiong, Zenggang; Liu, Xiuping; Gao, Xinbo
2017-12-01
Learning piecewise linear regression has been recognized as an effective way for example learning-based single image super-resolution (SR) in literature. In this paper, we employ an expectation-maximization (EM) algorithm to further improve the SR performance of our previous multiple linear mappings (MLM) based SR method. In the training stage, the proposed method starts with a set of linear regressors obtained by the MLM-based method, and then jointly optimizes the clustering results and the low- and high-resolution subdictionary pairs for regression functions by using the metric of the reconstruction errors. In the test stage, we select the optimal regressor for SR reconstruction by accumulating the reconstruction errors of m-nearest neighbors in the training set. Thorough experimental results carried on six publicly available datasets demonstrate that the proposed SR method can yield high-quality images with finer details and sharper edges in terms of both quantitative and perceptual image quality assessments.
Simultaneous multiple non-crossing quantile regression estimation using kernel constraints
Liu, Yufeng; Wu, Yichao
2011-01-01
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842
Lambert, Ronald J W; Mytilinaios, Ioannis; Maitland, Luke; Brown, Angus M
2012-08-01
This study describes a method to obtain parameter confidence intervals from the fitting of non-linear functions to experimental data, using the SOLVER and Analysis ToolPaK Add-In of the Microsoft Excel spreadsheet. Previously we have shown that Excel can fit complex multiple functions to biological data, obtaining values equivalent to those returned by more specialized statistical or mathematical software. However, a disadvantage of using the Excel method was the inability to return confidence intervals for the computed parameters or the correlations between them. Using a simple Monte-Carlo procedure within the Excel spreadsheet (without recourse to programming), SOLVER can provide parameter estimates (up to 200 at a time) for multiple 'virtual' data sets, from which the required confidence intervals and correlation coefficients can be obtained. The general utility of the method is exemplified by applying it to the analysis of the growth of Listeria monocytogenes, the growth inhibition of Pseudomonas aeruginosa by chlorhexidine and the further analysis of the electrophysiological data from the compound action potential of the rodent optic nerve. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Kim, Seong-Gil
2018-01-01
Background The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. Material/Methods This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. Results In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). Conclusions Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement. PMID:29760375
Kim, Seong-Gil; Kim, Wan-Soo
2018-05-15
BACKGROUND The purpose of this study was to investigate the effect of ankle ROM and lower-extremity muscle strength on static balance control ability in young adults. MATERIAL AND METHODS This study was conducted with 65 young adults, but 10 young adults dropped out during the measurement, so 55 young adults (male: 19, female: 36) completed the study. Postural sway (length and velocity) was measured with eyes open and closed, and ankle ROM (AROM and PROM of dorsiflexion and plantarflexion) and lower-extremity muscle strength (flexor and extensor of hip, knee, and ankle joint) were measured. Pearson correlation coefficient was used to examine the correlation between variables and static balance ability. Simple linear regression analysis and multiple linear regression analysis were used to examine the effect of variables on static balance ability. RESULTS In correlation analysis, plantarflexion ROM (AROM and PROM) and lower-extremity muscle strength (except hip extensor) were significantly correlated with postural sway (p<0.05). In simple correlation analysis, all variables that passed the correlation analysis procedure had significant influence (p<0.05). In multiple linear regression analysis, plantar flexion PROM with eyes open significantly influenced sway length (B=0.681) and sway velocity (B=0.011). CONCLUSIONS Lower-extremity muscle strength and ankle plantarflexion ROM influenced static balance control ability, with ankle plantarflexion PROM showing the greatest influence. Therefore, both contractile structures and non-contractile structures should be of interest when considering static balance control ability improvement.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Plant selection for ethnobotanical uses on the Amalfi Coast (Southern Italy).
Savo, V; Joy, R; Caneva, G; McClatchey, W C
2015-07-15
Many ethnobotanical studies have investigated selection criteria for medicinal and non-medicinal plants. In this paper we test several statistical methods using different ethnobotanical datasets in order to 1) define to which extent the nature of the datasets can affect the interpretation of results; 2) determine if the selection for different plant uses is based on phylogeny, or other selection criteria. We considered three different ethnobotanical datasets: two datasets of medicinal plants and a dataset of non-medicinal plants (handicraft production, domestic and agro-pastoral practices) and two floras of the Amalfi Coast. We performed residual analysis from linear regression, the binomial test and the Bayesian approach for calculating under-used and over-used plant families within ethnobotanical datasets. Percentages of agreement were calculated to compare the results of the analyses. We also analyzed the relationship between plant selection and phylogeny, chorology, life form and habitat using the chi-square test. Pearson's residuals for each of the significant chi-square analyses were examined for investigating alternative hypotheses of plant selection criteria. The three statistical analysis methods differed within the same dataset, and between different datasets and floras, but with some similarities. In the two medicinal datasets, only Lamiaceae was identified in both floras as an over-used family by all three statistical methods. All statistical methods in one flora agreed that Malvaceae was over-used and Poaceae under-used, but this was not found to be consistent with results of the second flora in which one statistical result was non-significant. All other families had some discrepancy in significance across methods, or floras. Significant over- or under-use was observed in only a minority of cases. The chi-square analyses were significant for phylogeny, life form and habitat. Pearson's residuals indicated a non-random selection of woody species for non-medicinal uses and an under-use of plants of temperate forests for medicinal uses. Our study showed that selection criteria for plant uses (including medicinal) are not always based on phylogeny. The comparison of different statistical methods (regression, binomial and Bayesian) under different conditions led to the conclusion that the most conservative results are obtained using regression analysis.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Sayago, Ana; Asuero, Agustin G
2006-09-14
A bilogarithmic hyperbolic cosine method for the spectrophotometric evaluation of stability constants of 1:1 weak complexes from continuous variation data has been devised and applied to literature data. A weighting scheme, however, is necessary in order to take into account the transformation for linearization. The method may be considered a useful alternative to methods in which one variable is involved on both sides of the basic equation (i.e. Heller and Schwarzenbach, Likussar and Adsul and Ramanathan). Classical least squares lead in those instances to biased and approximate stability constants and limiting absorbance values. The advantages of the proposed method are: the method gives a clear indication of the existence of only one complex in solution, it is flexible enough to allow for weighting of measurements and the computation procedure yield the best value of logbeta11 and its limit of error. The agreement between the values obtained by applying the weighted hyperbolic cosine method and the non-linear regression (NLR) method is good, being in both cases the mean quadratic error at a minimum.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
NASA Astrophysics Data System (ADS)
Fukuda, Jun'ichi; Johnson, Kaj M.
2010-06-01
We present a unified theoretical framework and solution method for probabilistic, Bayesian inversions of crustal deformation data. The inversions involve multiple data sets with unknown relative weights, model parameters that are related linearly or non-linearly through theoretic models to observations, prior information on model parameters and regularization priors to stabilize underdetermined problems. To efficiently handle non-linear inversions in which some of the model parameters are linearly related to the observations, this method combines both analytical least-squares solutions and a Monte Carlo sampling technique. In this method, model parameters that are linearly and non-linearly related to observations, relative weights of multiple data sets and relative weights of prior information and regularization priors are determined in a unified Bayesian framework. In this paper, we define the mixed linear-non-linear inverse problem, outline the theoretical basis for the method, provide a step-by-step algorithm for the inversion, validate the inversion method using synthetic data and apply the method to two real data sets. We apply the method to inversions of multiple geodetic data sets with unknown relative data weights for interseismic fault slip and locking depth. We also apply the method to the problem of estimating the spatial distribution of coseismic slip on faults with unknown fault geometry, relative data weights and smoothing regularization weight.
Ludbrook, John
2010-07-01
1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.
NASA Astrophysics Data System (ADS)
Hosseini, K.; Ayati, Z.; Ansari, R.
2018-04-01
One specific class of non-linear evolution equations, known as the Tzitzéica-type equations, has received great attention from a group of researchers involved in non-linear science. In this article, new exact solutions of the Tzitzéica-type equations arising in non-linear optics, including the Tzitzéica, Dodd-Bullough-Mikhailov and Tzitzéica-Dodd-Bullough equations, are obtained using the expa function method. The integration technique actually suggests a useful and reliable method to extract new exact solutions of a wide range of non-linear evolution equations.
Ceçen, F; Yangin, C
2000-12-01
This study examined the determination of BOD in landfill leachates by dilution (D-method) and manometric methods (M-method). The differences in results were discussed based on statistical tests. The effects of sample dilution, seeding, chloride and total Kjeldahl nitrogen (TKN) level were examined. The M-method was found to be more sensitive to increases in chloride and TKN concentrations. However, in the M-method the positive interference of nitrogenous BOD (NBOD) to carbonaceous BOD (CBOD) was more successfully prevented. The BOD rate constant k and the ultimate BOD (BODu) were estimated by non-linear regression. With the M-method these parameters could be more reliably estimated than the D-method. Suggestions were made for BOD analyses in landfill leachates in future studies.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858
USDA-ARS?s Scientific Manuscript database
Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...
Predicting phenotypes of asthma and eczema with machine learning
2014-01-01
Background There is increasing recognition that asthma and eczema are heterogeneous diseases. We investigated the predictive ability of a spectrum of machine learning methods to disambiguate clinical sub-groups of asthma, wheeze and eczema, using a large heterogeneous set of attributes in an unselected population. The aim was to identify to what extent such heterogeneous information can be combined to reveal specific clinical manifestations. Methods The study population comprised a cross-sectional sample of adults, and included representatives of the general population enriched by subjects with asthma. Linear and non-linear machine learning methods, from logistic regression to random forests, were fit on a large attribute set including demographic, clinical and laboratory features, genetic profiles and environmental exposures. Outcome of interest were asthma, wheeze and eczema encoded by different operational definitions. Model validation was performed via bootstrapping. Results The study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general, non-linear models achieved higher sensitivity and specificity than other methods, especially for asthma and wheeze, less for eczema, with areas under receiver operating characteristic curve of 84%, 76% and 64%, respectively. Our findings confirm that allergen sensitisation and lung function characterise asthma better in combination than separately. The predictive ability of genetic markers alone is limited. For eczema, new predictors such as bio-impedance were discovered. Conclusions More usefully-complex modelling is the key to a better understanding of disease mechanisms and personalised healthcare: further advances are likely with the incorporation of more factors/attributes and longitudinal measures. PMID:25077568
The Relationship between TOC and pH with Exchangeable Heavy Metal Levels in Lithuanian Podzols
NASA Astrophysics Data System (ADS)
Khaledian, Yones; Pereira, Paulo; Brevik, Eric C.; Pundyte, Neringa; Paliulis, Dainius
2017-04-01
Heavy metals can have a negative impact on public and environmental health. The objective of this study was to investigate the relationship between total organic carbon (TOC) and pH with exchangeable heavy metals (Pb, Cd, Cu and Zn) in order to predict exchangeable heavy metal content in soils sampled near Panevėžys and Kaunas, Lithuania. Principal component regression (PCR) and nonlinear regression methods were tested to find the statistical relationship between TOC and pH with heavy metals. The results of PCR [R2 = 0.68, RMSE = 0.07] and non-linear regression [R2 = 0.74, RMSE= 0.065] (pH with TOC and exchangeable parameters) were statistically significant. However, this was not observed in the relationships of pH and TOC separately with exchangeable heavy metals. The results indicated that pH had a higher correlation with exchangeable heavy metals (non-linear regression [R2 = 0.72, RMSE= 0.066]) than TOC with heavy metals [R2 = 0.30, RMSE= 0.004]. It can be concluded that even though there was a strong relationship between TOC and pH with exchangeable metals, the metal mobility (exchangeable metals) can be explained by pH better than TOC in this study. Finally, manipulating soil pH could likely be productive to assess and control heavy metals when financial and time limitations exist (Khaledian et al. 2016). Reference(s) Khaledian Y, Pereira P, Brevik E.C, Pundyte N, Paliulis D. 2016. The Influence of Organic Carbon and pH on Heavy Metals, Potassium, and Magnesium Levels in Lithuanian Podzols. Land Degradation and Development. DOI: 10.1002/ldr.2638
Impact of enzymatic and alkaline hydrolysis on CBD concentration in urine.
Bergamaschi, Mateus M; Barnes, Allan; Queiroz, Regina H C; Hurd, Yasmin L; Huestis, Marilyn A
2013-05-01
A sensitive and specific analytical method for cannabidiol (CBD) in urine was needed to define urinary CBD pharmacokinetics after controlled CBD administration, and to confirm compliance with CBD medications including Sativex-a cannabis plant extract containing 1:1 ∆(9)-tetrahydrocannabinol (THC) and CBD. Non-psychoactive CBD has a wide range of therapeutic applications and may also influence psychotropic smoked cannabis effects. Few methods exist for the quantification of CBD excretion in urine, and no data are available for phase II metabolism of CBD to CBD-glucuronide or CBD-sulfate. We optimized the hydrolysis of CBD-glucuronide and/or -sulfate, and developed and validated a GC-MS method for urinary CBD quantification. Solid-phase extraction isolated and concentrated analytes prior to GC-MS. Method validation included overnight hydrolysis (16 h) at 37 °C with 2,500 units β-glucuronidase from Red Abalone. Calibration curves were fit by linear least squares regression with 1/x (2) weighting with linear ranges (r(2) > 0.990) of 2.5-100 ng/mL for non-hydrolyzed CBD and 2.5-500 ng/mL for enzyme-hydrolyzed CBD. Bias was 88.7-105.3 %, imprecision 1.4-6.4 % CV and extraction efficiency 82.5-92.7 % (no hydrolysis) and 34.3-47.0 % (enzyme hydrolysis). Enzyme-hydrolyzed urine specimens exhibited more than a 250-fold CBD concentration increase compared to alkaline and non-hydrolyzed specimens. This method can be applied for urinary CBD quantification and further pharmacokinetics characterization following controlled CBD administration.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
NASA Astrophysics Data System (ADS)
Pipkins, Daniel Scott
Two diverse topics of relevance in modern computational mechanics are treated. The first involves the modeling of linear and non-linear wave propagation in flexible, lattice structures. The technique used combines the Laplace Transform with the Finite Element Method (FEM). The procedure is to transform the governing differential equations and boundary conditions into the transform domain where the FEM formulation is carried out. For linear problems, the transformed differential equations can be solved exactly, hence the method is exact. As a result, each member of the lattice structure is modeled using only one element. In the non-linear problem, the method is no longer exact. The approximation introduced is a spatial discretization of the transformed non-linear terms. The non-linear terms are represented in the transform domain by making use of the complex convolution theorem. A weak formulation of the resulting transformed non-linear equations yields a set of element level matrix equations. The trial and test functions used in the weak formulation correspond to the exact solution of the linear part of the transformed governing differential equation. Numerical results are presented for both linear and non-linear systems. The linear systems modeled are longitudinal and torsional rods and Bernoulli-Euler and Timoshenko beams. For non-linear systems, a viscoelastic rod and Von Karman type beam are modeled. The second topic is the analysis of plates and shallow shells under-going finite deflections by the Field/Boundary Element Method. Numerical results are presented for two plate problems. The first is the bifurcation problem associated with a square plate having free boundaries which is loaded by four, self equilibrating corner forces. The results are compared to two existing numerical solutions of the problem which differ substantially.
NASA Technical Reports Server (NTRS)
Hall, A. Daniel (Inventor); Davies, Francis J. (Inventor)
2007-01-01
Method and system are disclosed for determining individual string resistance in a network of strings when the current through a parallel connected string is unknown and when the voltage across a series connected string is unknown. The method/system of the invention involves connecting one or more frequency-varying impedance components with known electrical characteristics to each string and applying a frequency-varying input signal to the network of strings. The frequency-varying impedance components may be one or more capacitors, inductors, or both, and are selected so that each string is uniquely identifiable in the output signal resulting from the frequency-varying input signal. Numerical methods, such as non-linear regression, may then be used to resolve the resistance associated with each string.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
A modified homotopy perturbation method and the axial secular frequencies of a non-linear ion trap.
Doroudi, Alireza
2012-01-01
In this paper, a modified version of the homotopy perturbation method, which has been applied to non-linear oscillations by V. Marinca, is used for calculation of axial secular frequencies of a non-linear ion trap with hexapole and octopole superpositions. The axial equation of ion motion in a rapidly oscillating field of an ion trap can be transformed to a Duffing-like equation. With only octopole superposition the resulted non-linear equation is symmetric; however, in the presence of hexapole and octopole superpositions, it is asymmetric. This modified homotopy perturbation method is used for solving the resulting non-linear equations. As a result, the ion secular frequencies as a function of non-linear field parameters are obtained. The calculated secular frequencies are compared with the results of the homotopy perturbation method and the exact results. With only hexapole superposition, the results of this paper and the homotopy perturbation method are the same and with hexapole and octopole superpositions, the results of this paper are much more closer to the exact results compared with the results of the homotopy perturbation method.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.
Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J
2016-10-03
Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.
Terza, Joseph V; Bradford, W David; Dismuke, Clara E
2008-01-01
Objective To investigate potential bias in the use of the conventional linear instrumental variables (IV) method for the estimation of causal effects in inherently nonlinear regression settings. Data Sources Smoking Supplement to the 1979 National Health Interview Survey, National Longitudinal Alcohol Epidemiologic Survey, and simulated data. Study Design Potential bias from the use of the linear IV method in nonlinear models is assessed via simulation studies and real world data analyses in two commonly encountered regression setting: (1) models with a nonnegative outcome (e.g., a count) and a continuous endogenous regressor; and (2) models with a binary outcome and a binary endogenous regressor. Principle Findings The simulation analyses show that substantial bias in the estimation of causal effects can result from applying the conventional IV method in inherently nonlinear regression settings. Moreover, the bias is not attenuated as the sample size increases. This point is further illustrated in the survey data analyses in which IV-based estimates of the relevant causal effects diverge substantially from those obtained with appropriate nonlinear estimation methods. Conclusions We offer this research as a cautionary note to those who would opt for the use of linear specifications in inherently nonlinear settings involving endogeneity. PMID:18546544
Wang, LiQiang; Li, CuiFeng
2014-10-01
A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.
NASA Astrophysics Data System (ADS)
Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.
2017-01-01
One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.
Nasari, Masoud M; Szyszkowicz, Mieczysław; Chen, Hong; Crouse, Daniel; Turner, Michelle C; Jerrett, Michael; Pope, C Arden; Hubbell, Bryan; Fann, Neal; Cohen, Aaron; Gapstur, Susan M; Diver, W Ryan; Stieb, David; Forouzanfar, Mohammad H; Kim, Sun-Young; Olives, Casey; Krewski, Daniel; Burnett, Richard T
2016-01-01
The effectiveness of regulatory actions designed to improve air quality is often assessed by predicting changes in public health resulting from their implementation. Risk of premature mortality from long-term exposure to ambient air pollution is the single most important contributor to such assessments and is estimated from observational studies generally assuming a log-linear, no-threshold association between ambient concentrations and death. There has been only limited assessment of this assumption in part because of a lack of methods to estimate the shape of the exposure-response function in very large study populations. In this paper, we propose a new class of variable coefficient risk functions capable of capturing a variety of potentially non-linear associations which are suitable for health impact assessment. We construct the class by defining transformations of concentration as the product of either a linear or log-linear function of concentration multiplied by a logistic weighting function. These risk functions can be estimated using hazard regression survival models with currently available computer software and can accommodate large population-based cohorts which are increasingly being used for this purpose. We illustrate our modeling approach with two large cohort studies of long-term concentrations of ambient air pollution and mortality: the American Cancer Society Cancer Prevention Study II (CPS II) cohort and the Canadian Census Health and Environment Cohort (CanCHEC). We then estimate the number of deaths attributable to changes in fine particulate matter concentrations over the 2000 to 2010 time period in both Canada and the USA using both linear and non-linear hazard function models.
Mulier, Jan P; De Boeck, Liesje; Meulders, Michel; Beliën, Jeroen; Colpaert, Jan; Sels, Annabel
2015-01-01
Rationale, aims and objectives What factors determine the use of an anaesthesia preparation room and shorten non-operative time? Methods A logistic regression is applied to 18 751 surgery records from AZ Sint-Jan Brugge AV, Belgium, where each operating room has its own anaesthesia preparation room. Surgeries, in which the patient's induction has already started when the preceding patient's surgery has ended, belong to a first group where the preparation room is used as an induction room. Surgeries not fulfilling this property belong to a second group. A logistic regression model tries to predict the probability that a surgery will be classified into a specific group. Non-operative time is calculated as the time between end of the previous surgery and incision of the next surgery. A log-linear regression of this non-operative time is performed. Results It was found that switches in surgeons, being a non-elective surgery as well as the previous surgery being non-elective, increase the probability of being classified into the second group. Only a few surgery types, anaesthesiologists and operating rooms can be found exclusively in one of the two groups. Analysis of variance demonstrates that the first group has significantly lower non-operative times. Switches in surgeons, anaesthesiologists and longer scheduled durations of the previous surgery increases the non-operative time. A switch in both surgeon and anaesthesiologist strengthens this negative effect. Only a few operating rooms and surgery types influence the non-operative time. Conclusion The use of the anaesthesia preparation room shortens the non-operative time and is determined by several human and structural factors. PMID:25496600
NASA Astrophysics Data System (ADS)
Jintao, Xue; Yufei, Liu; Liming, Ye; Chunyan, Li; Quanwei, Yang; Weiying, Wang; Yun, Jing; Minxiang, Zhang; Peng, Li
2018-01-01
Near-Infrared Spectroscopy (NIRS) was first used to develop a method for rapid and simultaneous determination of 5 active alkaloids (berberine, coptisine, palmatine, epiberberine and jatrorrhizine) in 4 parts (rhizome, fibrous root, stem and leaf) of Coptidis Rhizoma. A total of 100 samples from 4 main places of origin were collected and studied. With HPLC analysis values as calibration reference, the quantitative analysis of 5 marker components was performed by two different modeling methods, partial least-squares (PLS) regression as linear regression and artificial neural networks (ANN) as non-linear regression. The results indicated that the 2 types of models established were robust, accurate and repeatable for five active alkaloids, and the ANN models was more suitable for the determination of berberine, coptisine and palmatine while the PLS model was more suitable for the analysis of epiberberine and jatrorrhizine. The performance of the optimal models was achieved as follows: the correlation coefficient (R) for berberine, coptisine, palmatine, epiberberine and jatrorrhizine was 0.9958, 0.9956, 0.9959, 0.9963 and 0.9923, respectively; the root mean square error of validation (RMSEP) was 0.5093, 0.0578, 0.0443, 0.0563 and 0.0090, respectively. Furthermore, for the comprehensive exploitation and utilization of plant resource of Coptidis Rhizoma, the established NIR models were used to analysis the content of 5 active alkaloids in 4 parts of Coptidis Rhizoma and 4 main origin of places. This work demonstrated that NIRS may be a promising method as routine screening for off-line fast analysis or on-line quality assessment of traditional Chinese medicine (TCM).
Monopole and dipole estimation for multi-frequency sky maps by linear regression
NASA Astrophysics Data System (ADS)
Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.
2017-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
NASA Astrophysics Data System (ADS)
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
Johnston, K M; Gustafson, P; Levy, A R; Grootendorst, P
2008-04-30
A major, often unstated, concern of researchers carrying out epidemiological studies of medical therapy is the potential impact on validity if estimates of treatment are biased due to unmeasured confounders. One technique for obtaining consistent estimates of treatment effects in the presence of unmeasured confounders is instrumental variables analysis (IVA). This technique has been well developed in the econometrics literature and is being increasingly used in epidemiological studies. However, the approach to IVA that is most commonly used in such studies is based on linear models, while many epidemiological applications make use of non-linear models, specifically generalized linear models (GLMs) such as logistic or Poisson regression. Here we present a simple method for applying IVA within the class of GLMs using the generalized method of moments approach. We explore some of the theoretical properties of the method and illustrate its use within both a simulation example and an epidemiological study where unmeasured confounding is suspected to be present. We estimate the effects of beta-blocker therapy on one-year all-cause mortality after an incident hospitalization for heart failure, in the absence of data describing disease severity, which is believed to be a confounder. 2008 John Wiley & Sons, Ltd
Non-linear aeroelastic prediction for aircraft applications
NASA Astrophysics Data System (ADS)
de C. Henshaw, M. J.; Badcock, K. J.; Vio, G. A.; Allen, C. B.; Chamberlain, J.; Kaynes, I.; Dimitriadis, G.; Cooper, J. E.; Woodgate, M. A.; Rampurawala, A. M.; Jones, D.; Fenwick, C.; Gaitonde, A. L.; Taylor, N. V.; Amor, D. S.; Eccles, T. A.; Denley, C. J.
2007-05-01
Current industrial practice for the prediction and analysis of flutter relies heavily on linear methods and this has led to overly conservative design and envelope restrictions for aircraft. Although the methods have served the industry well, it is clear that for a number of reasons the inclusion of non-linearity in the mathematical and computational aeroelastic prediction tools is highly desirable. The increase in available and affordable computational resources, together with major advances in algorithms, mean that non-linear aeroelastic tools are now viable within the aircraft design and qualification environment. The Partnership for Unsteady Methods in Aerodynamics (PUMA) Defence and Aerospace Research Partnership (DARP) was sponsored in 2002 to conduct research into non-linear aeroelastic prediction methods and an academic, industry, and government consortium collaborated to address the following objectives: To develop useable methodologies to model and predict non-linear aeroelastic behaviour of complete aircraft. To evaluate the methodologies on real aircraft problems. To investigate the effect of non-linearities on aeroelastic behaviour and to determine which have the greatest effect on the flutter qualification process. These aims have been very effectively met during the course of the programme and the research outputs include: New methods available to industry for use in the flutter prediction process, together with the appropriate coaching of industry engineers. Interesting results in both linear and non-linear aeroelastics, with comprehensive comparison of methods and approaches for challenging problems. Additional embryonic techniques that, with further research, will further improve aeroelastics capability. This paper describes the methods that have been developed and how they are deployable within the industrial environment. We present a thorough review of the PUMA aeroelastics programme together with a comprehensive review of the relevant research in this domain. This is set within the context of a generic industrial process and the requirements of UK and US aeroelastic qualification. A range of test cases, from simple small DOF cases to full aircraft, have been used to evaluate and validate the non-linear methods developed and to make comparison with the linear methods in everyday use. These have focused mainly on aerodynamic non-linearity, although some results for structural non-linearity are also presented. The challenges associated with time domain (coupled computational fluid dynamics-computational structural model (CFD-CSM)) methods have been addressed through the development of grid movement, fluid-structure coupling, and control surface movement technologies. Conclusions regarding the accuracy and computational cost of these are presented. The computational cost of time-domain methods, despite substantial improvements in efficiency, remains high. However, significant advances have been made in reduced order methods, that allow non-linear behaviour to be modelled, but at a cost comparable with that of the regular linear methods. Of particular note is a method based on Hopf bifurcation that has reached an appropriate maturity for deployment on real aircraft configurations, though only limited results are presented herein. Results are also presented for dynamically linearised CFD approaches that hold out the possibility of non-linear results at a fraction of the cost of time coupled CFD-CSM methods. Local linearisation approaches (higher order harmonic balance and continuation method) are also presented; these have the advantage that no prior assumption of the nature of the aeroelastic instability is required, but currently these methods are limited to low DOF problems and it is thought that these will not reach a level of maturity appropriate to real aircraft problems for some years to come. Nevertheless, guidance on the most likely approaches has been derived and this forms the basis for ongoing research. It is important to recognise that the aeroelastic design and qualification requires a variety of methods applicable at different stages of the process. The methods reported herein are mapped to the process, so that their applicability and complementarity may be understood. Overall, the programme has provided a suite of methods that allow realistic consideration of non-linearity in the aeroelastic design and qualification of aircraft. Deployment of these methods is underway in the industrial environment, but full realisation of the benefit of these approaches will require appropriate engagement with the standards community so that safety standards may take proper account of the inclusion of non-linearity.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Douglas, R K; Nawar, S; Alamar, M C; Mouazen, A M; Coulon, F
2018-03-01
Visible and near infrared spectrometry (vis-NIRS) coupled with data mining techniques can offer fast and cost-effective quantitative measurement of total petroleum hydrocarbons (TPH) in contaminated soils. Literature showed however significant differences in the performance on the vis-NIRS between linear and non-linear calibration methods. This study compared the performance of linear partial least squares regression (PLSR) with a nonlinear random forest (RF) regression for the calibration of vis-NIRS when analysing TPH in soils. 88 soil samples (3 uncontaminated and 85 contaminated) collected from three sites located in the Niger Delta were scanned using an analytical spectral device (ASD) spectrophotometer (350-2500nm) in diffuse reflectance mode. Sequential ultrasonic solvent extraction-gas chromatography (SUSE-GC) was used as reference quantification method for TPH which equal to the sum of aliphatic and aromatic fractions ranging between C 10 and C 35 . Prior to model development, spectra were subjected to pre-processing including noise cut, maximum normalization, first derivative and smoothing. Then 65 samples were selected as calibration set and the remaining 20 samples as validation set. Both vis-NIR spectrometry and gas chromatography profiles of the 85 soil samples were subjected to RF and PLSR with leave-one-out cross-validation (LOOCV) for the calibration models. Results showed that RF calibration model with a coefficient of determination (R 2 ) of 0.85, a root means square error of prediction (RMSEP) 68.43mgkg -1 , and a residual prediction deviation (RPD) of 2.61 outperformed PLSR (R 2 =0.63, RMSEP=107.54mgkg -1 and RDP=2.55) in cross-validation. These results indicate that RF modelling approach is accounting for the nonlinearity of the soil spectral responses hence, providing significantly higher prediction accuracy compared to the linear PLSR. It is recommended to adopt the vis-NIRS coupled with RF modelling approach as a portable and cost effective method for the rapid quantification of TPH in soils. Copyright © 2017 Elsevier B.V. All rights reserved.
Agha, Salah R; Alnahhal, Mohammed J
2012-11-01
The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Nested Conjugate Gradient Algorithm with Nested Preconditioning for Non-linear Image Restoration.
Skariah, Deepak G; Arigovindan, Muthuvel
2017-06-19
We develop a novel optimization algorithm, which we call Nested Non-Linear Conjugate Gradient algorithm (NNCG), for image restoration based on quadratic data fitting and smooth non-quadratic regularization. The algorithm is constructed as a nesting of two conjugate gradient (CG) iterations. The outer iteration is constructed as a preconditioned non-linear CG algorithm; the preconditioning is performed by the inner CG iteration that is linear. The inner CG iteration, which performs preconditioning for outer CG iteration, itself is accelerated by an another FFT based non-iterative preconditioner. We prove that the method converges to a stationary point for both convex and non-convex regularization functionals. We demonstrate experimentally that proposed method outperforms the well-known majorization-minimization method used for convex regularization, and a non-convex inertial-proximal method for non-convex regularization functional.
Ordinary chondrites - Multivariate statistical analysis of trace element contents
NASA Technical Reports Server (NTRS)
Lipschutz, Michael E.; Samuels, Stephen M.
1991-01-01
The contents of mobile trace elements (Co, Au, Sb, Ga, Se, Rb, Cs, Te, Bi, Ag, In, Tl, Zn, and Cd) in Antarctic and non-Antarctic populations of H4-6 and L4-6 chondrites, were compared using standard multivariate discriminant functions borrowed from linear discriminant analysis and logistic regression. A nonstandard randomization-simulation method was developed, making it possible to carry out probability assignments on a distribution-free basis. Compositional differences were found both between the Antarctic and non-Antarctic H4-6 chondrite populations and between two L4-6 chondrite populations. It is shown that, for various types of meteorites (in particular, for the H4-6 chondrites), the Antarctic/non-Antarctic compositional difference is due to preterrestrial differences in the genesis of their parent materials.
Macrocell path loss prediction using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Usman, Abraham U.; Okereke, Okpo U.; Omizegba, Elijah E.
2014-04-01
The prediction of propagation loss is a practical non-linear function approximation problem which linear regression or auto-regression models are limited in their ability to handle. However, some computational Intelligence techniques such as artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFISs) have been shown to have great ability to handle non-linear function approximation and prediction problems. In this study, the multiple layer perceptron neural network (MLP-NN), radial basis function neural network (RBF-NN) and an ANFIS network were trained using actual signal strength measurement taken at certain suburban areas of Bauchi metropolis, Nigeria. The trained networks were then used to predict propagation losses at the stated areas under differing conditions. The predictions were compared with the prediction accuracy of the popular Hata model. It was observed that ANFIS model gave a better fit in all cases having higher R2 values in each case and on average is more robust than MLP and RBF models as it generalises better to a different data.
Tan, Kok Chooi; Lim, Hwee San; Matjafri, Mohd Zubir; Abdullah, Khiruddin
2012-06-01
Atmospheric corrections for multi-temporal optical satellite images are necessary, especially in change detection analyses, such as normalized difference vegetation index (NDVI) rationing. Abrupt change detection analysis using remote-sensing techniques requires radiometric congruity and atmospheric correction to monitor terrestrial surfaces over time. Two atmospheric correction methods were used for this study: relative radiometric normalization and the simplified method for atmospheric correction (SMAC) in the solar spectrum. A multi-temporal data set consisting of two sets of Landsat images from the period between 1991 and 2002 of Penang Island, Malaysia, was used to compare NDVI maps, which were generated using the proposed atmospheric correction methods. Land surface temperature (LST) was retrieved using ATCOR3_T in PCI Geomatica 10.1 image processing software. Linear regression analysis was utilized to analyze the relationship between NDVI and LST. This study reveals that both of the proposed atmospheric correction methods yielded high accuracy through examination of the linear correlation coefficients. To check for the accuracy of the equation obtained through linear regression analysis for every single satellite image, 20 points were randomly chosen. The results showed that the SMAC method yielded a constant value (in terms of error) to predict the NDVI value from linear regression analysis-derived equation. The errors (average) from both proposed atmospheric correction methods were less than 10%.
Forecasting daily patient volumes in the emergency department.
Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L
2008-02-01
Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions.
Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios
2017-03-01
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
Modification of 2-D Time-Domain Shallow Water Wave Equation using Asymptotic Expansion Method
NASA Astrophysics Data System (ADS)
Khairuman, Teuku; Nasruddin, MN; Tulus; Ramli, Marwan
2018-01-01
Generally, research on the tsunami wave propagation model can be conducted by using a linear model of shallow water theory, where a non-linear side on high order is ignored. In line with research on the investigation of the tsunami waves, the Boussinesq equation model underwent a change aimed to obtain an improved quality of the dispersion relation and non-linearity by increasing the order to be higher. To solve non-linear sides at high order is used a asymptotic expansion method. This method can be used to solve non linear partial differential equations. In the present work, we found that this method needs much computational time and memory with the increase of the number of elements.
Detecting chaos in particle accelerators through the frequency map analysis method.
Papaphilippou, Yannis
2014-06-01
The motion of beams in particle accelerators is dominated by a plethora of non-linear effects, which can enhance chaotic motion and limit their performance. The application of advanced non-linear dynamics methods for detecting and correcting these effects and thereby increasing the region of beam stability plays an essential role during the accelerator design phase but also their operation. After describing the nature of non-linear effects and their impact on performance parameters of different particle accelerator categories, the theory of non-linear particle motion is outlined. The recent developments on the methods employed for the analysis of chaotic beam motion are detailed. In particular, the ability of the frequency map analysis method to detect chaotic motion and guide the correction of non-linear effects is demonstrated in particle tracking simulations but also experimental data.
Testing for nonlinearity in non-stationary physiological time series.
Guarín, Diego; Delgado, Edilson; Orozco, Álvaro
2011-01-01
Testing for nonlinearity is one of the most important preprocessing steps in nonlinear time series analysis. Typically, this is done by means of the linear surrogate data methods. But it is a known fact that the validity of the results heavily depends on the stationarity of the time series. Since most physiological signals are non-stationary, it is easy to falsely detect nonlinearity using the linear surrogate data methods. In this document, we propose a methodology to extend the procedure for generating constrained surrogate time series in order to assess nonlinearity in non-stationary data. The method is based on the band-phase-randomized surrogates, which consists (contrary to the linear surrogate data methods) in randomizing only a portion of the Fourier phases in the high frequency domain. Analysis of simulated time series showed that in comparison to the linear surrogate data method, our method is able to discriminate between linear stationarity, linear non-stationary and nonlinear time series. Applying our methodology to heart rate variability (HRV) records of five healthy patients, we encountered that nonlinear correlations are present in this non-stationary physiological signals.
Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R.
2012-01-01
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases. PMID:22448233
Marrero-Ponce, Yovani; Martínez-Albelo, Eugenio R; Casañola-Martín, Gerardo M; Castillo-Garit, Juan A; Echevería-Díaz, Yunaimy; Zaldivar, Vicente Romero; Tygat, Jan; Borges, José E Rodriguez; García-Domenech, Ramón; Torrens, Francisco; Pérez-Giménez, Facundo
2010-11-01
Novel bond-level molecular descriptors are proposed, based on linear maps similar to the ones defined in algebra theory. The kth edge-adjacency matrix (E(k)) denotes the matrix of bond linear indices (non-stochastic) with regard to canonical basis set. The kth stochastic edge-adjacency matrix, ES(k), is here proposed as a new molecular representation easily calculated from E(k). Then, the kth stochastic bond linear indices are calculated using ES(k) as operators of linear transformations. In both cases, the bond-type formalism is developed. The kth non-stochastic and stochastic total linear indices are calculated by adding the kth non-stochastic and stochastic bond linear indices, respectively, of all bonds in molecule. First, the new bond-based molecular descriptors (MDs) are tested for suitability, for the QSPRs, by analyzing regressions of novel indices for selected physicochemical properties of octane isomers (first round). General performance of the new descriptors in this QSPR studies is evaluated with regard to the well-known sets of 2D/3D MDs. From the analysis, we can conclude that the non-stochastic and stochastic bond-based linear indices have an overall good modeling capability proving their usefulness in QSPR studies. Later, the novel bond-level MDs are also used for the description and prediction of the boiling point of 28 alkyl-alcohols (second round), and to the modeling of the specific rate constant (log k), partition coefficient (log P), as well as the antibacterial activity of 34 derivatives of 2-furylethylenes (third round). The comparison with other approaches (edge- and vertices-based connectivity indices, total and local spectral moments, and quantum chemical descriptors as well as E-state/biomolecular encounter parameters) exposes a good behavior of our method in this QSPR studies. Finally, the approach described in this study appears to be a very promising structural invariant, useful not only for QSPR studies but also for similarity/diversity analysis and drug discovery protocols.
Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun
2008-01-01
This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.
Ito, Hiroshi; Yokoi, Takashi; Ikoma, Yoko; Shidahara, Miho; Seki, Chie; Naganawa, Mika; Takahashi, Hidehiko; Takano, Harumasa; Kimura, Yuichi; Ichise, Masanori; Suhara, Tetsuya
2010-01-01
In positron emission tomography (PET) studies with radioligands for neuroreceptors, tracer kinetics have been described by the standard two-tissue compartment model that includes the compartments of nondisplaceable binding and specific binding to receptors. In the present study, we have developed a new graphic plot analysis to determine the total distribution volume (V(T)) and nondisplaceable distribution volume (V(ND)) independently, and therefore the binding potential (BP(ND)). In this plot, Y(t) is the ratio of brain tissue activity to time-integrated arterial input function, and X(t) is the ratio of time-integrated brain tissue activity to time-integrated arterial input function. The x-intercept of linear regression of the plots for early phase represents V(ND), and the x-intercept of linear regression of the plots for delayed phase after the equilibrium time represents V(T). BP(ND) can be calculated by BP(ND)=V(T)/V(ND)-1. Dynamic PET scanning with measurement of arterial input function was performed on six healthy men after intravenous rapid bolus injection of [(11)C]FLB457. The plot yielded a curve in regions with specific binding while it yielded a straight line through all plot data in regions with no specific binding. V(ND), V(T), and BP(ND) values calculated by the present method were in good agreement with those by conventional non-linear least-squares fitting procedure. This method can be used to distinguish graphically whether the radioligand binding includes specific binding or not.
Wang, T T; Jiang, L
2017-10-01
Objective: To investigate the prognostic value of highly sensitive cardiac Troponin T (hs-cTn T) for sepsis in critically ill patients. Methods: Patients estimated to stay in the ICU of Fuxing Hospital for more than 24h were enrolled at from March 2014 to December 2014. Serum hs-cTn T was tested within two hours. Univariate and multivariate linear regression analyses were used to determine the association of variables with the hs-cTn T. Multivariable logistic regression analysis was used to evaluate the risk factors of 28-day mortality. Results: A total of 125 patients were finally enrolled including 68 patients with sepsis and 57 without. The levels of hs-cTn T in sepsis and non-sepsis groups were significantly different[52.0(32.5, 87.5) ng/L vs 14.0(6.5, 29.0) ng/L respectively, P <0.001]. In sepsis group, hs-cTn T among common sepsis, severe sepsis and septic shock were similar. Hs-cTn T was significantly higher in non-survivors than survivors [27(13, 52)ng/L vs 44.5(28.8, 83.5)ng/L, P <0.001]. Age, sepsis, serum creatinine were independent risk factors affecting hs-cTn T by multivariate linear regression analyses. But hs-cTn T was not a risk factor for death. Conclusion: Patients with sepsis had higher serum hs-cTn T than those without sepsis. but it was not found to be associated with the severity of sepsis.
Kumar, K Vasanth; Sivanesan, S
2005-08-31
Comparison analysis of linear least square method and non-linear method for estimating the isotherm parameters was made using the experimental equilibrium data of safranin onto activated carbon at two different solution temperatures 305 and 313 K. Equilibrium data were fitted to Freundlich, Langmuir and Redlich-Peterson isotherm equations. All the three isotherm equations showed a better fit to the experimental equilibrium data. The results showed that non-linear method could be a better way to obtain the isotherm parameters. Redlich-Peterson isotherm is a special case of Langmuir isotherm when the Redlich-Peterson isotherm constant g was unity.
Limits of detection and decision. Part 3
NASA Astrophysics Data System (ADS)
Voigtman, E.
2008-02-01
It has been shown that the MARLAP (Multi-Agency Radiological Laboratory Analytical Protocols) for estimating the Currie detection limit, which is based on 'critical values of the non-centrality parameter of the non-central t distribution', is intrinsically biased, even if no calibration curve or regression is used. This completed the refutation of the method, begun in Part 2. With the field cleared of obstructions, the true theory underlying Currie's limits of decision, detection and quantification, as they apply in a simple linear chemical measurement system (CMS) having heteroscedastic, Gaussian measurement noise and using weighted least squares (WLS) processing, was then derived. Extensive Monte Carlo simulations were performed, on 900 million independent calibration curves, for linear, "hockey stick" and quadratic noise precision models (NPMs). With errorless NPM parameters, all the simulation results were found to be in excellent agreement with the derived theoretical expressions. Even with as much as 30% noise on all of the relevant NPM parameters, the worst absolute errors in rates of false positives and false negatives, was only 0.3%.
Development of statistical linear regression model for metals from transportation land uses.
Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung
2009-01-01
The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.
NASA Astrophysics Data System (ADS)
Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari
2018-01-01
Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.
Fialkowski, Marie K; Ettienne, Reynolette; Shvetsov, Yurii B; Rivera, Rebecca L; Van Loan, Marta D; Savaiano, Dennis A; Boushey, Carol J
2015-01-01
Background The prevalence of overweight and obesity among adolescents has increased over the past decade. Prevalence rates are disparate among certain racial and ethnic groups. This study sought to longitudinally examine the relationship between overweight status (≥85th percentile according to the Centers for Disease Control and Prevention growth charts) and ethnic group, as well as acculturation (generation and language spoken in the home) in a sample of adolescent females. Methods Asian (n=160), Hispanic (n=217), and non-Hispanic White (n=304) early adolescent girls participating in the multistate calcium intervention study with complete information on weight, ethnicity, and acculturation were included. Multiple methods of assessing longitudinal relationships (binary logistic regression model, linear regression model, Cox proportional-hazards regression analysis, and Kaplan–Meier survival analysis) were used to examine the relationship. Results The total proportion of girls overweight at baseline was 36%. When examining by ethnic group, the proportion varied with Hispanic girls having the highest percentage (46%) in comparison to their Asian (23%) and Non-Hispanic White (35%) counterparts. Although the total proportion of overweight was 36% at 18 months, the variation across the ethnic groups remained with the proportion of Hispanic girls becoming overweight (55%) being greater than their Asian (18%) and non-Hispanic White (34%) counterparts. However, regardless of the statistical approach used, there were no significant associations between overweight status and acculturation over time. Conclusion These unexpected results warrant further exploration into factors associated with overweight, especially among Hispanic girls, and further investigation of acculturation’s role is warranted. Identifying these risk factors will be important for developing targeted obesity prevention initiatives. PMID:25624775
Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.
2007-01-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
1987-09-01
Edition,. Fail 1986. 33. Neter, John et al. Applied Linear Regression MoceL. Homewood IL: Richard D. Irwin, Incorporated, iJ83. 34. NovicK, David... Linear Regression Models (33) then, for each sample observation (X fh, the method of least squares considers the deviation of Yubms from its expected value...for finding good estimators of b - b5 * In -2raer to explain the procedure, the model Yubms = b0 + b!xfh will be discussed. According to Applied
A comparison of several methods of solving nonlinear regression groundwater flow problems
Cooley, Richard L.
1985-01-01
Computational efficiency and computer memory requirements for four methods of minimizing functions were compared for four test nonlinear-regression steady state groundwater flow problems. The fastest methods were the Marquardt and quasi-linearization methods, which required almost identical computer times and numbers of iterations; the next fastest was the quasi-Newton method, and last was the Fletcher-Reeves method, which did not converge in 100 iterations for two of the problems. The fastest method per iteration was the Fletcher-Reeves method, and this was followed closely by the quasi-Newton method. The Marquardt and quasi-linearization methods were slower. For all four methods the speed per iteration was directly related to the number of parameters in the model. However, this effect was much more pronounced for the Marquardt and quasi-linearization methods than for the other two. Hence the quasi-Newton (and perhaps Fletcher-Reeves) method might be more efficient than either the Marquardt or quasi-linearization methods if the number of parameters in a particular model were large, although this remains to be proven. The Marquardt method required somewhat less central memory than the quasi-linearization metilod for three of the four problems. For all four problems the quasi-Newton method required roughly two thirds to three quarters of the memory required by the Marquardt method, and the Fletcher-Reeves method required slightly less memory than the quasi-Newton method. Memory requirements were not excessive for any of the four methods.
Plant leaf chlorophyll content retrieval based on a field imaging spectroscopy system.
Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin
2014-10-23
A field imaging spectrometer system (FISS; 380-870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%-35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector.
Plant Leaf Chlorophyll Content Retrieval Based on a Field Imaging Spectroscopy System
Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin
2014-01-01
A field imaging spectrometer system (FISS; 380–870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%–35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector. PMID:25341439
Turkdogan-Aydinol, F Ilter; Yetilmezsoy, Kaan
2010-10-15
A MIMO (multiple inputs and multiple outputs) fuzzy-logic-based model was developed to predict biogas and methane production rates in a pilot-scale 90-L mesophilic up-flow anaerobic sludge blanket (UASB) reactor treating molasses wastewater. Five input variables such as volumetric organic loading rate (OLR), volumetric total chemical oxygen demand (TCOD) removal rate (R(V)), influent alkalinity, influent pH and effluent pH were fuzzified by the use of an artificial intelligence-based approach. Trapezoidal membership functions with eight levels were conducted for the fuzzy subsets, and a Mamdani-type fuzzy inference system was used to implement a total of 134 rules in the IF-THEN format. The product (prod) and the centre of gravity (COG, centroid) methods were employed as the inference operator and defuzzification methods, respectively. Fuzzy-logic predicted results were compared with the outputs of two exponential non-linear regression models derived in this study. The UASB reactor showed a remarkable performance on the treatment of molasses wastewater, with an average TCOD removal efficiency of 93 (+/-3)% and an average volumetric TCOD removal rate of 6.87 (+/-3.93) kg TCOD(removed)/m(3)-day, respectively. Findings of this study clearly indicated that, compared to non-linear regression models, the proposed MIMO fuzzy-logic-based model produced smaller deviations and exhibited a superior predictive performance on forecasting of both biogas and methane production rates with satisfactory determination coefficients over 0.98. 2010 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.
2017-12-01
Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
A New Sample Size Formula for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.; Barcikowski, Robert S.
The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…
An evaluation of methods for estimating decadal stream loads
NASA Astrophysics Data System (ADS)
Lee, Casey J.; Hirsch, Robert M.; Schwarz, Gregory E.; Holtschlag, David J.; Preston, Stephen D.; Crawford, Charles G.; Vecchia, Aldo V.
2016-11-01
Effective management of water resources requires accurate information on the mass, or load of water-quality constituents transported from upstream watersheds to downstream receiving waters. Despite this need, no single method has been shown to consistently provide accurate load estimates among different water-quality constituents, sampling sites, and sampling regimes. We evaluate the accuracy of several load estimation methods across a broad range of sampling and environmental conditions. This analysis uses random sub-samples drawn from temporally-dense data sets of total nitrogen, total phosphorus, nitrate, and suspended-sediment concentration, and includes measurements of specific conductance which was used as a surrogate for dissolved solids concentration. Methods considered include linear interpolation and ratio estimators, regression-based methods historically employed by the U.S. Geological Survey, and newer flexible techniques including Weighted Regressions on Time, Season, and Discharge (WRTDS) and a generalized non-linear additive model. No single method is identified to have the greatest accuracy across all constituents, sites, and sampling scenarios. Most methods provide accurate estimates of specific conductance (used as a surrogate for total dissolved solids or specific major ions) and total nitrogen - lower accuracy is observed for the estimation of nitrate, total phosphorus and suspended sediment loads. Methods that allow for flexibility in the relation between concentration and flow conditions, specifically Beale's ratio estimator and WRTDS, exhibit greater estimation accuracy and lower bias. Evaluation of methods across simulated sampling scenarios indicate that (1) high-flow sampling is necessary to produce accurate load estimates, (2) extrapolation of sample data through time or across more extreme flow conditions reduces load estimate accuracy, and (3) WRTDS and methods that use a Kalman filter or smoothing to correct for departures between individual modeled and observed values benefit most from more frequent water-quality sampling.
An evaluation of methods for estimating decadal stream loads
Lee, Casey; Hirsch, Robert M.; Schwarz, Gregory E.; Holtschlag, David J.; Preston, Stephen D.; Crawford, Charles G.; Vecchia, Aldo V.
2016-01-01
Effective management of water resources requires accurate information on the mass, or load of water-quality constituents transported from upstream watersheds to downstream receiving waters. Despite this need, no single method has been shown to consistently provide accurate load estimates among different water-quality constituents, sampling sites, and sampling regimes. We evaluate the accuracy of several load estimation methods across a broad range of sampling and environmental conditions. This analysis uses random sub-samples drawn from temporally-dense data sets of total nitrogen, total phosphorus, nitrate, and suspended-sediment concentration, and includes measurements of specific conductance which was used as a surrogate for dissolved solids concentration. Methods considered include linear interpolation and ratio estimators, regression-based methods historically employed by the U.S. Geological Survey, and newer flexible techniques including Weighted Regressions on Time, Season, and Discharge (WRTDS) and a generalized non-linear additive model. No single method is identified to have the greatest accuracy across all constituents, sites, and sampling scenarios. Most methods provide accurate estimates of specific conductance (used as a surrogate for total dissolved solids or specific major ions) and total nitrogen – lower accuracy is observed for the estimation of nitrate, total phosphorus and suspended sediment loads. Methods that allow for flexibility in the relation between concentration and flow conditions, specifically Beale’s ratio estimator and WRTDS, exhibit greater estimation accuracy and lower bias. Evaluation of methods across simulated sampling scenarios indicate that (1) high-flow sampling is necessary to produce accurate load estimates, (2) extrapolation of sample data through time or across more extreme flow conditions reduces load estimate accuracy, and (3) WRTDS and methods that use a Kalman filter or smoothing to correct for departures between individual modeled and observed values benefit most from more frequent water-quality sampling.
The extinction law from photometric data: linear regression methods
NASA Astrophysics Data System (ADS)
Ascenso, J.; Lombardi, M.; Lada, C. J.; Alves, J.
2012-04-01
Context. The properties of dust grains, in particular their size distribution, are expected to differ from the interstellar medium to the high-density regions within molecular clouds. Since the extinction at near-infrared wavelengths is caused by dust, the extinction law in cores should depart from that found in low-density environments if the dust grains have different properties. Aims: We explore methods to measure the near-infrared extinction law produced by dense material in molecular cloud cores from photometric data. Methods: Using controlled sets of synthetic and semi-synthetic data, we test several methods for linear regression applied to the specific problem of deriving the extinction law from photometric data. We cover the parameter space appropriate to this type of observations. Results: We find that many of the common linear-regression methods produce biased results when applied to the extinction law from photometric colors. We propose and validate a new method, LinES, as the most reliable for this effect. We explore the use of this method to detect whether or not the extinction law of a given reddened population has a break at some value of extinction. Based on observations collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile (ESO programmes 069.C-0426 and 074.C-0728).
Impact of enzymatic and alkaline hydrolysis on CBD concentration in urine
Bergamaschi, Mateus M.; Barnes, Allan; Queiroz, Regina H. C.; Hurd, Yasmin L.
2013-01-01
A sensitive and specific analytical method for cannabidiol (CBD) in urine was needed to define urinary CBD pharmacokinetics after controlled CBD administration, and to confirm compliance with CBD medications including Sativex—a cannabis plant extract containing 1:1 Δ9-tetrahydrocannabinol (THC) and CBD. Non-psychoactive CBD has a wide range of therapeutic applications and may also influence psychotropic smoked cannabis effects. Few methods exist for the quantification of CBD excretion in urine, and no data are available for phase II metabolism of CBD to CBD-glucuronide or CBD-sulfate. We optimized the hydrolysis of CBD-glucuronide and/or -sulfate, and developed and validated a GC-MS method for urinary CBD quantification. Solid-phase extraction isolated and concentrated analytes prior to GC-MS. Method validation included overnight hydrolysis (16 h) at 37 °C with 2,500 units β-glucuronidase from Red Abalone. Calibration curves were fit by linear least squares regression with 1/x2 weighting with linear ranges (r2>0.990) of 2.5–100 ng/mL for non-hydrolyzed CBD and 2.5–500 ng/mL for enzyme-hydrolyzed CBD. Bias was 88.7–105.3 %, imprecision 1.4–6.4 % CV and extraction efficiency 82.5–92.7 % (no hydrolysis) and 34.3–47.0 % (enzyme hydrolysis). Enzyme-hydrolyzed urine specimens exhibited more than a 250-fold CBD concentration increase compared to alkaline and non-hydrolyzed specimens. This method can be applied for urinary CBD quantification and further pharmacokinetics characterization following controlled CBD administration. PMID:23494274
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Abusam, A; Keesman, K J; van Straten, G; Spanjers, H; Meinema, K
2001-01-01
When applied to large simulation models, the process of parameter estimation is also called calibration. Calibration of complex non-linear systems, such as activated sludge plants, is often not an easy task. On the one hand, manual calibration of such complex systems is usually time-consuming, and its results are often not reproducible. On the other hand, conventional automatic calibration methods are not always straightforward and often hampered by local minima problems. In this paper a new straightforward and automatic procedure, which is based on the response surface method (RSM) for selecting the best identifiable parameters, is proposed. In RSM, the process response (output) is related to the levels of the input variables in terms of a first- or second-order regression model. Usually, RSM is used to relate measured process output quantities to process conditions. However, in this paper RSM is used for selecting the dominant parameters, by evaluating parameters sensitivity in a predefined region. Good results obtained in calibration of ASM No. 1 for N-removal in a full-scale oxidation ditch proved that the proposed procedure is successful and reliable.
NASA Astrophysics Data System (ADS)
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883
Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M
2017-05-01
Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower computational complexity when compared with a super-resolution method based on convolutional neural nets (SRCNN15). Compared with the previous SI method that is limited with a scale factor of 2, GLM-SI shows superior performance with average 0.79 dB higher in PSNR, and can be used for scale factors of 3 or higher.
Prediction of human core body temperature using non-invasive measurement methods.
Niedermann, Reto; Wyss, Eva; Annaheim, Simon; Psikuta, Agnes; Davey, Sarah; Rossi, René Michel
2014-01-01
The measurement of core body temperature is an efficient method for monitoring heat stress amongst workers in hot conditions. However, invasive measurement of core body temperature (e.g. rectal, intestinal, oesophageal temperature) is impractical for such applications. Therefore, the aim of this study was to define relevant non-invasive measures to predict core body temperature under various conditions. We conducted two human subject studies with different experimental protocols, different environmental temperatures (10 °C, 30 °C) and different subjects. In both studies the same non-invasive measurement methods (skin temperature, skin heat flux, heart rate) were applied. A principle component analysis was conducted to extract independent factors, which were then used in a linear regression model. We identified six parameters (three skin temperatures, two skin heat fluxes and heart rate), which were included for the calculation of two factors. The predictive value of these factors for core body temperature was evaluated by a multiple regression analysis. The calculated root mean square deviation (rmsd) was in the range from 0.28 °C to 0.34 °C for all environmental conditions. These errors are similar to previous models using non-invasive measures to predict core body temperature. The results from this study illustrate that multiple physiological parameters (e.g. skin temperature and skin heat fluxes) are needed to predict core body temperature. In addition, the physiological measurements chosen in this study and the algorithm defined in this work are potentially applicable as real-time core body temperature monitoring to assess health risk in broad range of working conditions.
González, Juan R; Carrasco, Josep L; Armengol, Lluís; Villatoro, Sergi; Jover, Lluís; Yasui, Yutaka; Estivill, Xavier
2008-01-01
Background MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample. Results Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace. Conclusion Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed. PMID:18522760
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Estimating integrated variance in the presence of microstructure noise using linear regression
NASA Astrophysics Data System (ADS)
Holý, Vladimír
2017-07-01
Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.
Multi-objective Optimization of Solar Irradiance and Variance at Pertinent Inclination Angles
NASA Astrophysics Data System (ADS)
Jain, Dhanesh; Lalwani, Mahendra
2018-05-01
The performance of photovoltaic panel gets highly affected bychange in atmospheric conditions and angle of inclination. This article evaluates the optimum tilt angle and orientation angle (surface azimuth angle) for solar photovoltaic array in order to get maximum solar irradiance and to reduce variance of radiation at different sets or subsets of time periods. Non-linear regression and adaptive neural fuzzy interference system (ANFIS) methods are used for predicting the solar radiation. The results of ANFIS are more accurate in comparison to non-linear regression. These results are further used for evaluating the correlation and applied for estimating the optimum combination of tilt angle and orientation angle with the help of general algebraic modelling system and multi-objective genetic algorithm. The hourly average solar irradiation is calculated at different combinations of tilt angle and orientation angle with the help of horizontal surface radiation data of Jodhpur (Rajasthan, India). The hourly average solar irradiance is calculated for three cases: zero variance, with actual variance and with double variance at different time scenarios. It is concluded that monthly collected solar radiation produces better result as compared to bimonthly, seasonally, half-yearly and yearly collected solar radiation. The profit obtained for monthly varying angle has 4.6% more with zero variance and 3.8% more with actual variance, than the annually fixed angle.
Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F
2016-01-01
In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log 10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Spectral Reconstruction Based on Svm for Cross Calibration
NASA Astrophysics Data System (ADS)
Gao, H.; Ma, Y.; Liu, W.; He, H.
2017-05-01
Chinese HY-1C/1D satellites will use a 5nm/10nm-resolutional visible-near infrared(VNIR) hyperspectral sensor with the solar calibrator to cross-calibrate with other sensors. The hyperspectral radiance data are composed of average radiance in the sensor's passbands and bear a spectral smoothing effect, a transform from the hyperspectral radiance data to the 1-nm-resolution apparent spectral radiance by spectral reconstruction need to be implemented. In order to solve the problem of noise cumulation and deterioration after several times of iteration by the iterative algorithm, a novel regression method based on SVM is proposed, which can approach arbitrary complex non-linear relationship closely and provide with better generalization capability by learning. In the opinion of system, the relationship between the apparent radiance and equivalent radiance is nonlinear mapping introduced by spectral response function(SRF), SVM transform the low-dimensional non-linear question into high-dimensional linear question though kernel function, obtaining global optimal solution by virtue of quadratic form. The experiment is performed using 6S-simulated spectrums considering the SRF and SNR of the hyperspectral sensor, measured reflectance spectrums of water body and different atmosphere conditions. The contrastive result shows: firstly, the proposed method is with more reconstructed accuracy especially to the high-frequency signal; secondly, while the spectral resolution of the hyperspectral sensor reduces, the proposed method performs better than the iterative method; finally, the root mean square relative error(RMSRE) which is used to evaluate the difference of the reconstructed spectrum and the real spectrum over the whole spectral range is calculated, it decreses by one time at least by proposed method.
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Kuesten, Carla; Bi, Jian
2018-06-03
Conventional drivers of liking analysis was extended with a time dimension into temporal drivers of liking (TDOL) based on functional data analysis methodology and non-additive models for multiple-attribute time-intensity (MATI) data. The non-additive models, which consider both direct effects and interaction effects of attributes to consumer overall liking, include Choquet integral and fuzzy measure in the multi-criteria decision-making, and linear regression based on variance decomposition. Dynamics of TDOL, i.e., the derivatives of the relative importance functional curves were also explored. Well-established R packages 'fda', 'kappalab' and 'relaimpo' were used in the paper for developing TDOL. Applied use of these methods shows that the relative importance of MATI curves offers insights for understanding the temporal aspects of consumer liking for fruit chews.
Mutual information estimation for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.
2012-04-01
For the automated, objective and joint analysis of time series, similarity measures are crucial. Used in the analysis of climate records, they allow for a complimentary, unbiased view onto sparse datasets. The irregular sampling of many of these time series, however, makes it necessary to either perform signal reconstruction (e.g. interpolation) or to develop and use adapted measures. Standard linear interpolation comes with an inevitable loss of information and bias effects. We have recently developed a Gaussian kernel-based correlation algorithm with which the interpolation error can be substantially lowered, but this would not work should the functional relationship in a bivariate setting be non-linear. We therefore propose an algorithm to estimate lagged auto and cross mutual information from irregularly sampled time series. We have extended the standard and adaptive binning histogram estimators and use Gaussian distributed weights in the estimation of the (joint) probabilities. To test our method we have simulated linear and nonlinear auto-regressive processes with Gamma-distributed inter-sampling intervals. We have then performed a sensitivity analysis for the estimation of actual coupling length, the lag of coupling and the decorrelation time in the synthetic time series and contrast our results to the performance of a signal reconstruction scheme. Finally we applied our estimator to speleothem records. We compare the estimated memory (or decorrelation time) to that from a least-squares estimator based on fitting an auto-regressive process of order 1. The calculated (cross) mutual information results are compared for the different estimators (standard or adaptive binning) and contrasted with results from signal reconstruction. We find that the kernel-based estimator has a significantly lower root mean square error and less systematic sampling bias than the interpolation-based method. It is possible that these encouraging results could be further improved by using non-histogram mutual information estimators, like k-Nearest Neighbor or Kernel-Density estimators, but for short (<1000 points) and irregularly sampled datasets the proposed algorithm is already a great improvement.
Sahin, Rubina; Tapadia, Kavita
2015-01-01
The three widely used isotherms Langmuir, Freundlich and Temkin were examined in an experiment using fluoride (F⁻) ion adsorption on a geo-material (limonite) at four different temperatures by linear and non-linear models. Comparison of linear and non-linear regression models were given in selecting the optimum isotherm for the experimental results. The coefficient of determination, r², was used to select the best theoretical isotherm. The four Langmuir linear equations (1, 2, 3, and 4) are discussed. Langmuir isotherm parameters obtained from the four Langmuir linear equations using the linear model differed but they were the same when using the nonlinear model. Langmuir-2 isotherm is one of the linear forms, and it had the highest coefficient of determination (r² = 0.99) compared to the other Langmuir linear equations (1, 3 and 4) in linear form, whereas, for non-linear, Langmuir-4 fitted best among all the isotherms because it had the highest coefficient of determination (r² = 0.99). The results showed that the non-linear model may be a better way to obtain the parameters. In the present work, the thermodynamic parameters show that the absorption of fluoride onto limonite is both spontaneous (ΔG < 0) and endothermic (ΔH > 0). Scanning electron microscope and X-ray diffraction images also confirm the adsorption of F⁻ ion onto limonite. The isotherm and kinetic study reveals that limonite can be used as an adsorbent for fluoride removal. In future we can develop new technology for fluoride removal in large scale by using limonite which is cost-effective, eco-friendly and is easily available in the study area.
NASA Technical Reports Server (NTRS)
Barrett, Joe H., III; Roeder, William P.
2010-01-01
The expected peak wind speed for the day is an important element in the daily morning forecast for ground and space launch operations at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS). The 45th Weather Squadron (45 WS) must issue forecast advisories for KSC/CCAFS when they expect peak gusts for >= 25, >= 35, and >= 50 kt thresholds at any level from the surface to 300 ft. In Phase I of this task, the 45 WS tasked the Applied Meteorology Unit (AMU) to develop a cool-season (October - April) tool to help forecast the non-convective peak wind from the surface to 300 ft at KSC/CCAFS. During the warm season, these wind speeds are rarely exceeded except during convective winds or under the influence of tropical cyclones, for which other techniques are already in use. The tool used single and multiple linear regression equations to predict the peak wind from the morning sounding. The forecaster manually entered several observed sounding parameters into a Microsoft Excel graphical user interface (GUI), and then the tool displayed the forecast peak wind speed, average wind speed at the time of the peak wind, the timing of the peak wind and the probability the peak wind will meet or exceed 35, 50 and 60 kt. The 45 WS customers later dropped the requirement for >= 60 kt wind warnings. During Phase II of this task, the AMU expanded the period of record (POR) by six years to increase the number of observations used to create the forecast equations. A large number of possible predictors were evaluated from archived soundings, including inversion depth and strength, low-level wind shear, mixing height, temperature lapse rate and winds from the surface to 3000 ft. Each day in the POR was stratified in a number of ways, such as by low-level wind direction, synoptic weather pattern, precipitation and Bulk Richardson number. The most accurate Phase II equations were then selected for an independent verification. The Phase I and II forecast methods were compared using an independent verification data set. The two methods were compared to climatology, wind warnings and advisories issued by the 45 WS, and North American Mesoscale (NAM) model (MesoNAM) forecast winds. The performance of the Phase I and II methods were similar with respect to mean absolute error. Since the Phase I data were not stratified by precipitation, this method's peak wind forecasts had a large negative bias on days with precipitation and a small positive bias on days with no precipitation. Overall, the climatology methods performed the worst while the MesoNAM performed the best. Since the MesoNAM winds were the most accurate in the comparison, the final version of the tool was based on the MesoNAM winds. The probability the peak wind will meet or exceed the warning thresholds were based on the one standard deviation error bars from the linear regression. For example, the linear regression might forecast the most likely peak speed to be 35 kt and the error bars used to calculate that the probability of >= 25 kt = 76%, the probability of >= 35 kt = 50%, and the probability of >= 50 kt = 19%. The authors have not seen this application of linear regression error bars in any other meteorological applications. Although probability forecast tools should usually be developed with logistic regression, this technique could be easily generalized to any linear regression forecast tool to estimate the probability of exceeding any desired threshold . This could be useful for previously developed linear regression forecast tools or new forecast applications where statistical analysis software to perform logistic regression is not available. The tool was delivered in two formats - a Microsoft Excel GUI and a Tool Command Language/Tool Kit (Tcl/Tk) GUI in the Meteorological Interactive Data Display System (MIDDS). The Microsoft Excel GUI reads a MesoNAM text file containing hourly forecasts from 0 to 84 hours, from one model run (00 or 12 UTC). The GUI then displays e peak wind speed, average wind speed, and the probability the peak wind will meet or exceed the 25-, 35- and 50-kt thresholds. The user can display the Day-1 through Day-3 peak wind forecasts, and separate forecasts are made for precipitation and non-precipitation days. The MIDDS GUI uses data from the NAM and Global Forecast System (GFS), instead of the MesoNAM. It can display Day-1 and Day-2 forecasts using NAM data, and Day-1 through Day-5 forecasts using GFS data. The timing of the peak wind is not displayed, since the independent verification showed that none of the forecast methods performed significantly better than climatology. The forecaster should use the climatological timing of the peak wind (2248 UTC) as a first guess and then adjust it based on the movement of weather features.
El-Kommos, Michael E; El-Gizawy, Samia M; Atia, Noha N; Hosny, Noha M
2014-03-01
The combination of certain non-sedating antihistamines (NSA) such as fexofenadine (FXD), ketotifen (KET) and loratadine (LOR) with pseudoephedrine (PSE) or acetaminophen (ACE) is widely used in the treatment of allergic rhinitis, conjunctivitis and chronic urticaria. A rapid, simple, selective and precise densitometric method was developed and validated for simultaneous estimation of six synthetic binary mixtures and their pharmaceutical dosage forms. The method employed thin layer chromatography aluminum plates precoated with silica gel G 60 F254 as the stationary phase. The mobile phases chosen for development gave compact bands for the mixtures FXD-PSE (I), KET-PSE (II), LOR-PSE (III), FXD-ACE (IV), KET-ACE (V) and LOR-ACE (VI) [Retardation factor (Rf ) values were (0.20, 0.32), (0.69, 0.34), (0.79, 0.13), (0.36, 0.70), (0.51, 0.30) and (0.76, 0.26), respectively]. Spectrodensitometric scanning integration was performed at 217, 218, 218, 233, 272 and 251 nm for the mixtures I-VI, respectively. The linear regression data for the calibration plots showed an excellent linear relationship. The method was validated for precision, accuracy, robustness and recovery. Limits of detection and quantitation were calculated. Statistical analysis proved that the method is reproducible and selective for the simultaneous estimation of these binary mixtures. Copyright © 2013 John Wiley & Sons, Ltd.
Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel
2016-12-01
The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.
NASA Astrophysics Data System (ADS)
Salmon, B. P.; Kleynhans, W.; Olivier, J. C.; van den Bergh, F.; Wessels, K. J.
2018-05-01
Humans are transforming land cover at an ever-increasing rate. Accurate geographical maps on land cover, especially rural and urban settlements are essential to planning sustainable development. Time series extracted from MODerate resolution Imaging Spectroradiometer (MODIS) land surface reflectance products have been used to differentiate land cover classes by analyzing the seasonal patterns in reflectance values. The proper fitting of a parametric model to these time series usually requires several adjustments to the regression method. To reduce the workload, a global setting of parameters is done to the regression method for a geographical area. In this work we have modified a meta-optimization approach to setting a regression method to extract the parameters on a per time series basis. The standard deviation of the model parameters and magnitude of residuals are used as scoring function. We successfully fitted a triply modulated model to the seasonal patterns of our study area using a non-linear extended Kalman filter (EKF). The approach uses temporal information which significantly reduces the processing time and storage requirements to process each time series. It also derives reliability metrics for each time series individually. The features extracted using the proposed method are classified with a support vector machine and the performance of the method is compared to the original approach on our ground truth data.
ERIC Educational Resources Information Center
Baker, Bruce D.; Richards, Craig E.
1999-01-01
Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Kumar, P; Kumar, Dinesh; Rai, K N
2016-08-01
In this article, a non-linear dual-phase-lag (DPL) bio-heat transfer model based on temperature dependent metabolic heat generation rate is derived to analyze the heat transfer phenomena in living tissues during thermal ablation treatment. The numerical solution of the present non-linear problem has been done by finite element Runge-Kutta (4,5) method which combines the essence of Runge-Kutta (4,5) method together with finite difference scheme. Our study demonstrates that at the thermal ablation position temperature predicted by non-linear and linear DPL models show significant differences. A comparison has been made among non-linear DPL, thermal wave and Pennes model and it has been found that non-linear DPL and thermal wave bio-heat model show almost same nature whereas non-linear Pennes model shows significantly different temperature profile at the initial stage of thermal ablation treatment. The effect of Fourier number and Vernotte number (relaxation Fourier number) on temperature profile in presence and absence of externally applied heat source has been studied in detail and it has been observed that the presence of externally applied heat source term highly affects the efficiency of thermal treatment method. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bomfim, Rafael Aiello; Crosato, Edgard; Mazzilli, Luiz Eugênio Nigro; Frias, Antonio Carlos
2015-01-01
This study evaluates the prevalence and risk factors of non-carious cervical lesions (NCCLs) in a Brazilian population of workers exposed and non-exposed to acid mists and chemical products. One hundred workers (46 exposed and 54 non-exposed) were evaluated in a Centro de Referência em Saúde do Trabalhador - CEREST (Worker's Health Reference Center). The workers responded to questionnaires regarding their personal information and about alcohol consumption and tobacco use. A clinical examination was conducted to evaluate the presence of NCCLs, according to WHO parameters. Statistical analyses were performed by unconditional logistic regression and multiple linear regression, with the critical level of p < 0.05. NCCLs were significantly associated with age groups (18-34, 35-44, 45-68 years). The unconditional logistic regression showed that the presence of NCCLs was better explained by age group (OR = 4.04; CI 95% 1.77-9.22) and occupational exposure to acid mists and chemical products (OR = 3.84; CI 95% 1.10-13.49), whereas the linear multiple regression revealed that NCCLs were better explained by years of smoking (p = 0.01) and age group (p = 0.04). The prevalence of NCCLs in the study population was particularly high (76.84%), and the risk factors for NCCLs were age, exposure to acid mists and smoking habit. Controlling risk factors through preventive and educative measures, allied to the use of personal protective equipment to prevent the occupational exposure to acid mists, may contribute to minimizing the prevalence of NCCLs.
Marcos, Raül; Llasat, Ma Carmen; Quintana-Seguí, Pere; Turco, Marco
2018-01-01
In this paper, we have compared different bias correction methodologies to assess whether they could be advantageous for improving the performance of a seasonal prediction model for volume anomalies in the Boadella reservoir (northwestern Mediterranean). The bias correction adjustments have been applied on precipitation and temperature from the European Centre for Middle-range Weather Forecasting System 4 (S4). We have used three bias correction strategies: two linear (mean bias correction, BC, and linear regression, LR) and one non-linear (Model Output Statistics analogs, MOS-analog). The results have been compared with climatology and persistence. The volume-anomaly model is a previously computed Multiple Linear Regression that ingests precipitation, temperature and in-flow anomaly data to simulate monthly volume anomalies. The potential utility for end-users has been assessed using economic value curve areas. We have studied the S4 hindcast period 1981-2010 for each month of the year and up to seven months ahead considering an ensemble of 15 members. We have shown that the MOS-analog and LR bias corrections can improve the original S4. The application to volume anomalies points towards the possibility to introduce bias correction methods as a tool to improve water resource seasonal forecasts in an end-user context of climate services. Particularly, the MOS-analog approach gives generally better results than the other approaches in late autumn and early winter. Copyright © 2017 Elsevier B.V. All rights reserved.
Arambasić, M B; Jatić-Slavković, D
2004-05-01
This paper presents the application of the regression analysis program and the program for comparing linear regressions (modified method for one-way, analysis of variance), writtens in BASIC program language, for instance, determination of content of Diclofenac-Sodium (active ingredient in DIKLOFEN injections, ampules á 75 mg/3 ml). Stability testing of Diclofenac-Sodium was done by isothermic method of accelerated aging at 4 different temperatures (30 degrees, 40 degrees, 50 degrees and 60 degrees C) as a function of time (4 different duration of treatment: (0-155, 0-145, 0-74 and 0-44 days). The decrease in stability (decrease in the mean value of the content of Diclofenac-Sodium (in %), at different temperatures as a function of time, is possible to describe by, linear dependance. According to the value for regression equation values, the times are assessed in which the content of Diclofenac-Sodium (in %) will decrease by 10%, of the initial value. The times are follows at 30 degrees C 761.02 days, at 40 degrees C 397.26 days, at 50 degrees C 201.96 days and at 60 degrees C 58.85 days. The estimated times (in days) in which the mean value for Diclofenac-Sodium content (in %) will by 10% of the initial values, as a junction of time, are most suitably described by 3rd order parabola. Based on the parameter values which describe the 3rd order parabola, the time was estimated in which Diclofenac-Sodium content mean value (in %) will fall by 10% of the initial one at average ambient temperatures of 20 degrees C and 25 degrees C. The times are: 1409.47 days (20 degrees C) and 1042.39 days (25 degrees C). Based on the value for Fischer's coefficien (F), the comparison of trenf of Diclofenac-Sodium content (in %) shows that, under the influence of different temperatures as a function of time, among them, depending on temperature value, there is: statistically very significant difference (P < .05) at 50 degrees C and lower toward 60 degrees C, i.e. statistically probably significant difference (P > 0.01) at 40 degrees C and lower towards 50 degrees C and there is no statistically significance difference (P > 0.05) at 30 degrees C towards 40 degrees C.
Application of glas laser altimetry to detect elevation changes in East Antarctica
NASA Astrophysics Data System (ADS)
Scaioni, M.; Tong, X.; Li, R.
2013-10-01
In this paper the use of ICESat/GLAS laser altimeter for estimating multi-temporal elevation changes on polar ice sheets is afforded. Due to non-overlapping laser spots during repeat passes, interpolation methods are required to make comparisons. After reviewing the main methods described in the literature (crossover point analysis, cross-track DEM projection, space-temporal regressions), the last one has been chosen for its capability of providing more elevation change rate measurements. The standard implementation of the space-temporal linear regression technique has been revisited and improved to better cope with outliers and to check the estimability of model's parameters. GLAS data over the PANDA route in East Antarctica have been used for testing. Obtained results have been quite meaningful from a physical point of view, confirming the trend reported by the literature of a constant snow accumulation in the area during the two past decades, unlike the most part of the continent that has been losing mass.
NASA Technical Reports Server (NTRS)
Lo, Ching F.
1999-01-01
The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
van de Kassteele, Jan; Zwakhals, Laurens; Breugelmans, Oscar; Ameling, Caroline; van den Brink, Carolien
2017-07-01
Local policy makers increasingly need information on health-related indicators at smaller geographic levels like districts or neighbourhoods. Although more large data sources have become available, direct estimates of the prevalence of a health-related indicator cannot be produced for neighbourhoods for which only small samples or no samples are available. Small area estimation provides a solution, but unit-level models for binary-valued outcomes that can handle both non-linear effects of the predictors and spatially correlated random effects in a unified framework are rarely encountered. We used data on 26 binary-valued health-related indicators collected on 387,195 persons in the Netherlands. We associated the health-related indicators at the individual level with a set of 12 predictors obtained from national registry data. We formulated a structured additive regression model for small area estimation. The model captured potential non-linear relations between the predictors and the outcome through additive terms in a functional form using penalized splines and included a term that accounted for spatially correlated heterogeneity between neighbourhoods. The registry data were used to predict individual outcomes which in turn are aggregated into higher geographical levels, i.e. neighbourhoods. We validated our method by comparing the estimated prevalences with observed prevalences at the individual level and by comparing the estimated prevalences with direct estimates obtained by weighting methods at municipality level. We estimated the prevalence of the 26 health-related indicators for 415 municipalities, 2599 districts and 11,432 neighbourhoods in the Netherlands. We illustrate our method on overweight data and show that there are distinct geographic patterns in the overweight prevalence. Calibration plots show that the estimated prevalences agree very well with observed prevalences at the individual level. The estimated prevalences agree reasonably well with the direct estimates at the municipal level. Structured additive regression is a useful tool to provide small area estimates in a unified framework. We are able to produce valid nationwide small area estimates of 26 health-related indicators at neighbourhood level in the Netherlands. The results can be used for local policy makers to make appropriate health policy decisions.
New method for calculating a mathematical expression for streamflow recession
Rutledge, Albert T.
1991-01-01
An empirical method has been devised to calculate the master recession curve, which is a mathematical expression for streamflow recession during times of negligible direct runoff. The method is based on the assumption that the storage-delay factor, which is the time per log cycle of streamflow recession, varies linearly with the logarithm of streamflow. The resulting master recession curve can be nonlinear. The method can be executed by a computer program that reads a data file of daily mean streamflow, then allows the user to select several near-linear segments of streamflow recession. The storage-delay factor for each segment is one of the coefficients of the equation that results from linear least-squares regression. Using results for each recession segment, a mathematical expression of the storage-delay factor as a function of the log of streamflow is determined by linear least-squares regression. The master recession curve, which is a second-order polynomial expression for time as a function of log of streamflow, is then derived using the coefficients of this function.
Gou, Faxiang; Liu, Xinfeng; He, Jian; Liu, Dongpeng; Cheng, Yao; Liu, Haixia; Yang, Xiaoting; Wei, Kongfu; Zheng, Yunhe; Jiang, Xiaojuan; Meng, Lei; Hu, Wenbiao
2018-01-08
To determine the linear and non-linear interacting relationships between weather factors and hand, foot and mouth disease (HFMD) in children in Gansu, China, and gain further traction as an early warning signal based on weather variability for HFMD transmission. Weekly HFMD cases aged less than 15 and meteorological information from 2010 to 2014 in Jiuquan, Lanzhou and Tianshu, Gansu, China were collected. Generalized linear regression models (GLM) with Poisson link and classification and regression trees (CART) were employed to determine the combined and interactive relationship of weather factors and HFMD in both linear and non-linear ways. GLM suggested an increase in weekly HFMD of 5.9% [95% confidence interval (CI): 5.4%, 6.5%] in Tianshui, 2.8% [2.5%, 3.1%] in Lanzhou and 1.8% [1.4%, 2.2%] in Jiuquan in association with a 1 °C increase in average temperature, respectively. And 1% increase of relative humidity could increase weekly HFMD of 2.47% [2.23%, 2.71%] in Lanzhou and 1.11% [0.72%, 1.51%] in Tianshui. CART revealed that average temperature and relative humidity were the first two important determinants, and their threshold values for average temperature deceased from 20 °C of Jiuquan to 16 °C in Tianshui; and for relative humidity, threshold values increased from 38% of Jiuquan to 65% of Tianshui. Average temperature was the primary weather factor in three areas, more sensitive in southeast Tianshui, compared with northwest Jiuquan; Relative humidity's effect on HFMD showed a non-linear interacting relationship with average temperature.
Revisiting the extended spring indices using gridded weather data and machine learning
NASA Astrophysics Data System (ADS)
Mehdipoor, Hamed; Izquierdo-Verdiguier, Emma; Zurita-Milla, Raul
2016-04-01
The extended spring indices or SI-x [1] have been successfully used to predict the timing of spring onset at continental scales. The SI-x models were created by combining lilac and honeysuckle volunteered phenological observations, temperature data (from weather stations) and latitudinal information. More precisely, these models use a linear regression to predict the day of year of first leaf and first bloom for these two indicator species. In this contribution we revisit both the data and the method used to calibrate the SI-x models to check whether the addition of new input data or the use of non-linear regression methods could lead to improments in the model outputs. In particular, we use a recently published dataset [2] of volunteered observations on cloned and common lilac over longer period of time (1980-2014) and we replace the weather station data by 54 features derived from Daymet [3], which provides 1 by 1 km gridded estimates of daily weather parameters (maximum and minimum temperatures, precipitation, water vapor pressure, solar radiation, day length, snow water equivalent) for North America. These features consist of both daily weather values and their long- and short-term accumulations and elevation. we also replace the original linear regression by a non-linear method. Specifically, we use random forests to both identify the most important features and to predict the day of year of the first leaf of cloned and common lilacs. Preliminary results confirm the importance of the SI-x features (maximum and minimum temperatures and day length). However, our results show that snow water equivalent and water vapor pressure are also necessary to properly model leaf onset. Regarding the predictions, our results indicate that Random Forests yield comparable results to those produced by the SI-x models (in terms of root mean square error -RMSE). For cloned and common lilac, the models predict the day of year of leafing with 16 and 15 days of accuracy respectively. Further research should focus on extensively comparing the features used by both modelling approaches and on analyzing spring onset patterns over continental United States. References 1. Schwartz, M.D., T.R. Ault, and J.L. Betancourt, Spring onset variations and trends in the continental United States: past and regional assessment using temperature-based indices. International Journal of Climatology, 2013. 33(13): p. 2917-2922. 2. Rosemartin, A.H., et al., Lilac and honeysuckle phenology data 1956-2014. Scientific Data, 2015. 2: p. 150038. 3. Thornton, P.E., et al. Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 2. 2014.
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Multiple imputation of rainfall missing data in the Iberian Mediterranean context
NASA Astrophysics Data System (ADS)
Miró, Juan Javier; Caselles, Vicente; Estrela, María José
2017-11-01
Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.
Convert a low-cost sensor to a colorimeter using an improved regression method
NASA Astrophysics Data System (ADS)
Wu, Yifeng
2008-01-01
Closed loop color calibration is a process to maintain consistent color reproduction for color printers. To perform closed loop color calibration, a pre-designed color target should be printed, and automatically measured by a color measuring instrument. A low cost sensor has been embedded to the printer to perform the color measurement. A series of sensor calibration and color conversion methods have been developed. The purpose is to get accurate colorimetric measurement from the data measured by the low cost sensor. In order to get high accuracy colorimetric measurement, we need carefully calibrate the sensor, and minimize all possible errors during the color conversion. After comparing several classical color conversion methods, a regression based color conversion method has been selected. The regression is a powerful method to estimate the color conversion functions. But the main difficulty to use this method is to find an appropriate function to describe the relationship between the input and the output data. In this paper, we propose to use 1D pre-linearization tables to improve the linearity between the input sensor measuring data and the output colorimetric data. Using this method, we can increase the accuracy of the regression method, so as to improve the accuracy of the color conversion.
A decline in the prevalence of injecting drug users in Estonia, 2005–2009
Uusküla, A; Rajaleid, K; Talu, A; Abel-Ollo, K; Des Jarlais, DC
2013-01-01
Aims and setting Descriptions of behavioural epidemics have received little attention compared with infectious disease epidemics in Eastern Europe. Here we report a study aimed at estimating trends in the prevalence of injection drug use between 2005 and 2009 in Estonia. Design and methods The number of injection drug users (IDUs) aged 15–44 each year between 2005 and 2009 was estimated using capture-recapture methodology based on 4 data sources (2 treatment data bases: drug abuse and non-fatal overdose treatment; criminal justice (drug related offences) and mortality (injection drug use related deaths) data). Poisson log-linear regression models were applied to the matched data, with interactions between data sources fitted to replicate the dependencies between the data sources. Linear regression was used to estimate average change over time. Findings there were 24305, 12292, 238, 545 records and 8100, 1655, 155, 545 individual IDUs identified in the four capture sources (Police, drug treatment, overdose, and death registry, accordingly) over the period 2005 – 2009. The estimated prevalence of IDUs among the population aged 15–44 declined from 2.7% (1.8–7.9%) in 2005 to 2.0% (1.4–5.0%) in 2008, and 0.9% (0.7–1.7%) in 2009. Regression analysis indicated an average reduction of over 1700 injectors per year. Conclusion While the capture-recapture method has known limitations, the results are consistent with other data from Estonia. Identifying the drivers of change in the prevalence of injection drug use warrants further research. PMID:23290632
Ruan, Xiaofang; Zhang, Ruisheng; Yao, Xiaojun; Liu, Mancang; Fan, Botao
2007-03-01
Alkylphenols are a group of permanent pollutants in the environment and could adversely disturb the human endocrine system. It is therefore important to effectively separate and measure the alkylphenols. To guide the chromatographic analysis of these compounds in practice, the development of quantitative relationship between the molecular structure and the retention time of alkylphenols becomes necessary. In this study, topological, constitutional, geometrical, electrostatic and quantum-chemical descriptors of 44 alkylphenols were calculated using a software, CODESSA, and these descriptors were pre-selected using the heuristic method. As a result, three-descriptor linear model (LM) was developed to describe the relationship between the molecular structure and the retention time of alkylphenols. Meanwhile, the non-linear regression model was also developed based on support vector machine (SVM) using the same three descriptors. The correlation coefficient (R(2)) for the LM and SVM was 0.98 and 0. 92, and the corresponding root-mean-square error was 0. 99 and 2. 77, respectively. By comparing the stability and prediction ability of the two models, it was found that the linear model was a better method for describing the quantitative relationship between the retention time of alkylphenols and the molecular structure. The results obtained suggested that the linear model could be applied for the chromatographic analysis of alkylphenols with known molecular structural parameters.
NASA Astrophysics Data System (ADS)
Khansari, Maziyar M.; O'Neill, William; Penn, Richard; Blair, Norman P.; Chau, Felix; Shahidi, Mahnaz
2017-03-01
The conjunctiva is a densely vascularized tissue of the eye that provides an opportunity for imaging of human microcirculation. In the current study, automated fine structure analysis of conjunctival microvasculature images was performed to discriminate stages of diabetic retinopathy (DR). The study population consisted of one group of nondiabetic control subjects (NC) and 3 groups of diabetic subjects, with no clinical DR (NDR), non-proliferative DR (NPDR), or proliferative DR (PDR). Ordinary least square regression and Fisher linear discriminant analyses were performed to automatically discriminate images between group pairs of subjects. Human observers who were masked to the grouping of subjects performed image discrimination between group pairs. Over 80% and 70% of images of subjects with clinical and non-clinical DR were correctly discriminated by the automated method, respectively. The discrimination rates of the automated method were higher than human observers. The fine structure analysis of conjunctival microvasculature images provided discrimination of DR stages and can be potentially useful for DR screening and monitoring.
Body mass index: accounting for full time sedentary occupation and 24-hr self-reported time use.
Tudor-Locke, Catrine; Schuna, John M; Katzmarzyk, Peter T; Liu, Wei; Hamrick, Karen S; Johnson, William D
2014-01-01
We used linked existing data from the 2006-2008 American Time Use Survey (ATUS), the Current Population Survey (CPS, a federal survey that provides on-going U.S. vital statistics, including employment rates) and self-reported body mass index (BMI) to answer: How does BMI vary across full time occupations dichotomized as sedentary/non-sedentary, accounting for time spent in sleep, other sedentary behaviors, and light, moderate, and vigorous intensity activities? We classified time spent engaged at a primary job (sedentary or non-sedentary), sleep, and other non-work, non-sleep intensity-defined behaviors, specifically, sedentary behavior, light, moderate, and vigorous intensity activities. Age groups were defined by 20-29, 30-39, 40-49, and 50-64 years. BMI groups were defined by 18.5-24.9, 25.0-27.4, 27.5-29.9, 30.0-34.9, and ≥35.0 kg/m2. Logistic and linear regression were used to examine the association between BMI and employment in a sedentary occupation, considering time spent in sleep, other non-work time spent in sedentary behaviors, and light, moderate, and vigorous intensity activities, sex, age race/ethnicity, and household income. The analysis data set comprised 4,092 non-pregnant, non-underweight individuals 20-64 years of age who also reported working more than 7 hours at their primary jobs on their designated time use reporting day. Logistic and linear regression analyses failed to reveal any associations between BMI and the sedentary/non-sedentary occupation dichotomy considering time spent in sleep, other non-work time spent in sedentary behaviors, and light, moderate, and vigorous intensity activities, sex, age, race/ethnicity, and household income. We found no evidence of a relationship between self-reported full time sedentary occupation classification and BMI after accounting for sex, age, race/ethnicity, and household income and 24-hours of time use including non-work related physical activity and sedentary behaviors. The various sources of error associated with self-report methods and assignment of generalized activity and occupational intensity categories could compound to obscure any real relationships.
Coelho, Lúcia H G; Gutz, Ivano G R
2006-03-15
A chemometric method for analysis of conductometric titration data was introduced to extend its applicability to lower concentrations and more complex acid-base systems. Auxiliary pH measurements were made during the titration to assist the calculation of the distribution of protonable species on base of known or guessed equilibrium constants. Conductivity values of each ionized or ionizable species possibly present in the sample were introduced in a general equation where the only unknown parameters were the total concentrations of (conjugated) bases and of strong electrolytes not involved in acid-base equilibria. All these concentrations were adjusted by a multiparametric nonlinear regression (NLR) method, based on the Levenberg-Marquardt algorithm. This first conductometric titration method with NLR analysis (CT-NLR) was successfully applied to simulated conductometric titration data and to synthetic samples with multiple components at concentrations as low as those found in rainwater (approximately 10 micromol L(-1)). It was possible to resolve and quantify mixtures containing a strong acid, formic acid, acetic acid, ammonium ion, bicarbonate and inert electrolyte with accuracy of 5% or better.
Ruuska, Salla; Hämäläinen, Wilhelmiina; Kajava, Sari; Mughal, Mikaela; Matilainen, Pekka; Mononen, Jaakko
2018-03-01
The aim of the present study was to evaluate empirically confusion matrices in device validation. We compared the confusion matrix method to linear regression and error indices in the validation of a device measuring feeding behaviour of dairy cattle. In addition, we studied how to extract additional information on classification errors with confusion probabilities. The data consisted of 12 h behaviour measurements from five dairy cows; feeding and other behaviour were detected simultaneously with a device and from video recordings. The resulting 216 000 pairs of classifications were used to construct confusion matrices and calculate performance measures. In addition, hourly durations of each behaviour were calculated and the accuracy of measurements was evaluated with linear regression and error indices. All three validation methods agreed when the behaviour was detected very accurately or inaccurately. Otherwise, in the intermediate cases, the confusion matrix method and error indices produced relatively concordant results, but the linear regression method often disagreed with them. Our study supports the use of confusion matrix analysis in validation since it is robust to any data distribution and type of relationship, it makes a stringent evaluation of validity, and it offers extra information on the type and sources of errors. Copyright © 2018 Elsevier B.V. All rights reserved.
Regional flow duration curves: Geostatistical techniques versus multivariate regression
Pugliese, Alessio; Farmer, William H.; Castellarin, Attilio; Archfield, Stacey A.; Vogel, Richard M.
2016-01-01
A period-of-record flow duration curve (FDC) represents the relationship between the magnitude and frequency of daily streamflows. Prediction of FDCs is of great importance for locations characterized by sparse or missing streamflow observations. We present a detailed comparison of two methods which are capable of predicting an FDC at ungauged basins: (1) an adaptation of the geostatistical method, Top-kriging, employing a linear weighted average of dimensionless empirical FDCs, standardised with a reference streamflow value; and (2) regional multiple linear regression of streamflow quantiles, perhaps the most common method for the prediction of FDCs at ungauged sites. In particular, Top-kriging relies on a metric for expressing the similarity between catchments computed as the negative deviation of the FDC from a reference streamflow value, which we termed total negative deviation (TND). Comparisons of these two methods are made in 182 largely unregulated river catchments in the southeastern U.S. using a three-fold cross-validation algorithm. Our results reveal that the two methods perform similarly throughout flow-regimes, with average Nash-Sutcliffe Efficiencies 0.566 and 0.662, (0.883 and 0.829 on log-transformed quantiles) for the geostatistical and the linear regression models, respectively. The differences between the reproduction of FDC's occurred mostly for low flows with exceedance probability (i.e. duration) above 0.98.
Interpreting Regression Results: beta Weights and Structure Coefficients are Both Important.
ERIC Educational Resources Information Center
Thompson, Bruce
Various realizations have led to less frequent use of the "OVA" methods (analysis of variance--ANOVA--among others) and to more frequent use of general linear model approaches such as regression. However, too few researchers understand all the various coefficients produced in regression. This paper explains these coefficients and their…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Siewicki, T.C.; Chandler, G.T.
1995-12-31
Eastern oysters (Crassostrea virginica) were continuously exposed to suspended {sup 14}C-fluoranthene spiked-sediment for either: (1) five days followed by 24 days deputation, or (2) 28 days exposure. Sediment less than 63 um contained fluoranthene concentrations one or ten times that measured at suburbanized sites in southeastern estuaries (133 or 1,300 ng/g). The data were evaluated both raw and normalized for tissue lipid and sediment organic carbon concentrations. Uptake rate constants were estimated using non-linear regression methods. Depuration rate constants were estimated by linear regression of the deputation phase following five-days exposure and as the second partial derivative of the non-linearmore » regression for the 28-day exposures. Uptake and deputation rate constants, bioconcentration factors and half-lives were similar regardless of exposure time, sediment fluoranthene concentration or use of data normalization. Uptake and deputation rate constants, bioconcentration factors and half-lives (days) were similar and low for all experiments, ranging from 0.02 to 0.10, 0.14 to 0.30, 0.09 to 0.46, and 2.4 to 5.0, respectively. Degradation by the mixed function oxidase system is not expected in oysters allowing the use of radiotracers for measuring very low concentrations of fluoranthene. The results suggest that short-term exposures followed by deputation are effective for estimating kinetic rate constants and that normalization provides little benefit in these controlled studies. The results further show that bioconcentration of sediment-associated fluoranthene, and possibly other polycyclic aromatic hydrocarbons, is very low compared to either dissolved forms or levels commonly used in regulatory actions.« less
Krylov Subspace Methods for Complex Non-Hermitian Linear Systems. Thesis
NASA Technical Reports Server (NTRS)
Freund, Roland W.
1991-01-01
We consider Krylov subspace methods for the solution of large sparse linear systems Ax = b with complex non-Hermitian coefficient matrices. Such linear systems arise in important applications, such as inverse scattering, numerical solution of time-dependent Schrodinger equations, underwater acoustics, eddy current computations, numerical computations in quantum chromodynamics, and numerical conformal mapping. Typically, the resulting coefficient matrices A exhibit special structures, such as complex symmetry, or they are shifted Hermitian matrices. In this paper, we first describe a Krylov subspace approach with iterates defined by a quasi-minimal residual property, the QMR method, for solving general complex non-Hermitian linear systems. Then, we study special Krylov subspace methods designed for the two families of complex symmetric respectively shifted Hermitian linear systems. We also include some results concerning the obvious approach to general complex linear systems by solving equivalent real linear systems for the real and imaginary parts of x. Finally, numerical experiments for linear systems arising from the complex Helmholtz equation are reported.
Psychiatric Characteristics of the Cardiac Outpatients with Chest Pain
Lee, Jea-Geun; Kim, Song-Yi; Kim, Ki-Seok; Joo, Seung-Jae
2016-01-01
Background and Objectives A cardiologist's evaluation of psychiatric symptoms in patients with chest pain is rare. This study aimed to determine the psychiatric characteristics of patients with and without coronary artery disease (CAD) and explore their relationship with the intensity of chest pain. Subjects and Methods Out of 139 consecutive patients referred to the cardiology outpatient department, 31 with atypical chest pain (heartburn, acid regurgitation, dyspnea, and palpitation) were excluded and 108 were enrolled for the present study. The enrolled patients underwent complete numerical rating scale of chest pain and the symptom checklist for minor psychiatric disorders at the time of first outpatient visit. The non-CAD group consisted of patients with a normal stress test, coronary computed tomography angiogram, or coronary angiogram, and the CAD group included those with an abnormal coronary angiogram. Results Nineteen patients (17.6%) were diagnosed with CAD. No differences in the psychiatric characteristics were observed between the groups. "Feeling tense", "self-reproach", and "trouble falling asleep" were more frequently observed in the non-CAD (p=0.007; p=0.046; p=0.044) group. In a multiple linear regression analysis with a stepwise selection, somatization without chest pain in the non-CAD group and hypochondriasis in the CAD group were linearly associated with the intensity of chest pain (β=0.108, R2=0.092, p=0.004; β= -0.525, R2=0.290, p=0.010). Conclusion No differences in psychiatric characteristics were observed between the groups. The intensity of chest pain was linearly associated with somatization without chest pain in the non-CAD group and inversely linearly associated with hypochondriasis in the CAD group. PMID:27014347
Supervised Learning for Dynamical System Learning.
Hefny, Ahmed; Downey, Carlton; Gordon, Geoffrey J
2015-01-01
Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L 1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.
An extended harmonic balance method based on incremental nonlinear control parameters
NASA Astrophysics Data System (ADS)
Khodaparast, Hamed Haddad; Madinei, Hadi; Friswell, Michael I.; Adhikari, Sondipon; Coggon, Simon; Cooper, Jonathan E.
2017-02-01
A new formulation for calculating the steady-state responses of multiple-degree-of-freedom (MDOF) non-linear dynamic systems due to harmonic excitation is developed. This is aimed at solving multi-dimensional nonlinear systems using linear equations. Nonlinearity is parameterised by a set of 'non-linear control parameters' such that the dynamic system is effectively linear for zero values of these parameters and nonlinearity increases with increasing values of these parameters. Two sets of linear equations which are formed from a first-order truncated Taylor series expansion are developed. The first set of linear equations provides the summation of sensitivities of linear system responses with respect to non-linear control parameters and the second set are recursive equations that use the previous responses to update the sensitivities. The obtained sensitivities of steady-state responses are then used to calculate the steady state responses of non-linear dynamic systems in an iterative process. The application and verification of the method are illustrated using a non-linear Micro-Electro-Mechanical System (MEMS) subject to a base harmonic excitation. The non-linear control parameters in these examples are the DC voltages that are applied to the electrodes of the MEMS devices.
NASA Astrophysics Data System (ADS)
Kaplan, Melike; Hosseini, Kamyar; Samadani, Farzan; Raza, Nauman
2018-07-01
A wide range of problems in different fields of the applied sciences especially non-linear optics is described by non-linear Schrödinger's equations (NLSEs). In the present paper, a specific type of NLSEs known as the cubic-quintic non-linear Schrödinger's equation including an anti-cubic term has been studied. The generalized Kudryashov method along with symbolic computation package has been exerted to carry out this objective. As a consequence, a series of optical soliton solutions have formally been retrieved. It is corroborated that the generalized form of Kudryashov method is a direct, effectual, and reliable technique to deal with various types of non-linear Schrödinger's equations.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Time-varying trends of global vegetation activity
NASA Astrophysics Data System (ADS)
Pan, N.; Feng, X.; Fu, B.
2016-12-01
Vegetation plays an important role in regulating the energy change, water cycle and biochemical cycle in terrestrial ecosystems. Monitoring the dynamics of vegetation activity and understanding their driving factors have been an important issue in global change research. Normalized Difference Vegetation Index (NDVI), an indicator of vegetation activity, has been widely used in investigating vegetation changes at regional and global scales. Most studies utilized linear regression or piecewise linear regression approaches to obtain an averaged changing rate over a certain time span, with an implicit assumption that the trend didn't change over time during that period. However, no evidence shows that this assumption is right for the non-linear and non-stationary NDVI time series. In this study, we adopted the multidimensional ensemble empirical mode decomposition (MEEMD) method to extract the time-varying trends of NDVI from original signals without any a priori assumption of their functional form. Our results show that vegetation trends are spatially and temporally non-uniform during 1982-2013. Most vegetated area exhibited greening trends in the 1980s. Nevertheless, the area with greening trends decreased over time since the early 1990s, and the greening trends have stalled or even reversed in many places. Regions with browning trends were mainly located in southern low latitudes in the 1980s, whose area decreased before the middle 1990s and then increased at an accelerated rate. The greening-to-browning reversals were widespread across all continents except Oceania (43% of the vegetated areas), most of which happened after the middle 1990s. In contrast, the browning-to-greening reversals occurred in smaller area and earlier time. The area with monotonic greening and browning trends accounted for 33% and 5% of the vegetated area, respectively. By performing partial correlation analyses between NDVI and climatic elements (temperature, precipitation and cloud cover) and analyzing the MEEMD-extracted trends of these climatic elements, we discussed possible driving factors of the time-varying trends of NDVI in several specific regions where trend reversals occurred.
NASA Astrophysics Data System (ADS)
Ying, Yibin; Liu, Yande; Fu, Xiaping; Lu, Huishan
2005-11-01
The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. However, majority of today's applications of ANNs is back-propagate feed-forward ANN (BP-ANN). In this paper, back-propagation artificial neural networks (BP-ANN) were applied for modeling soluble solid content (SSC) of intact pear from their Fourier transform near infrared (FT-NIR) spectra. One hundred and sixty-four pear samples were used to build the calibration models and evaluate the models predictive ability. The results are compared to the classical calibration approaches, i.e. principal component regression (PCR), partial least squares (PLS) and non-linear PLS (NPLS). The effects of the optimal methods of training parameters on the prediction model were also investigated. BP-ANN combine with principle component regression (PCR) resulted always better than the classical PCR, PLS and Weight-PLS methods, from the point of view of the predictive ability. Based on the results, it can be concluded that FT-NIR spectroscopy and BP-ANN models can be properly employed for rapid and nondestructive determination of fruit internal quality.
O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H
2012-10-01
To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.
Keshavarz, Alireza; Zilouei, Hamid; Abdolmaleki, Amir; Asadinezhad, Ahmad
2015-07-01
A surface modification method was carried out to enhance the light crude oil sorption capacity of polyurethane foam (PUF) through immobilization of multi-walled carbon nanotube (MWCNT) on the foam surface at various concentrations. The developed sorbent was characterized using scanning electron microscopy, Fourier transform infrared spectroscopy, thermogravimetric analysis, and tensile elongation test. The results obtained from thermogravimetric and tensile elongation tests showed the improvement of thermal and mechanical resistance of surface-modified foam. The experimental data also revealed that the immobilization of MWCNT on PUF surface enhanced the sorption capacity of light crude oil and reduced water sorption. The highest oil removal capacity was obtained for 1 wt% MWCNT on PUF surface which was 21.44% enhancement in light crude oil sorption compared to the blank PUF. The reusability of surface modified PUF was determined through four cycles of chemical regeneration using petroleum ether. The adsorption of light crude oil with 30 g initial mass showed that 85.45% of the initial oil sorption capacity of this modified sorbent was remained after four regeneration cycles. Equilibrium isotherms for adsorption of oil were analyzed by the Freundlich, Langmuir, Temkin, and Redlich-Peterson models through linear and non-linear regression methods. Results of equilibrium revealed that Langmuir isotherm is the best fitting model and non-linear method is a more accurate way to predict the parameters involved in the isotherms. The overall findings suggested the promising potentials of the developed sorbent in order to be efficiently used in large-scale oil spill cleanup. Copyright © 2015 Elsevier Ltd. All rights reserved.
Trends in Timing of Pregnancy Awareness Among US Women.
Branum, Amy M; Ahrens, Katherine A
2017-04-01
Objectives Early pregnancy detection is important for improving pregnancy outcomes as the first trimester is a critical window of fetal development; however, there has been no description of trends in timing of pregnancy awareness among US women. Methods We examined data from the 1995, 2002, 2006-2010 and 2011-2013 National Survey of Family Growth on self-reported timing of pregnancy awareness among women aged 15-44 years who reported at least one pregnancy in the 4 or 5 years prior to interview that did not result in induced abortion or adoption (n = 17, 406). We examined the associations between maternal characteristics and late pregnancy awareness (≥7 weeks' gestation) using adjusted prevalence ratios from logistic regression models. Gestational age at time of pregnancy awareness (continuous) was regressed over year of pregnancy conception (1990-2012) in a linear model. Results Among all pregnancies reported, gestational age at time of pregnancy awareness was 5.5 weeks (standard error = 0.04) and the prevalence of late pregnancy awareness was 23 % (standard error = 1 %). Late pregnancy awareness decreased with maternal age, was more prevalent among non-Hispanic black and Hispanic women compared to non-Hispanic white women, and for unintended pregnancies versus those that were intended (p < 0.01). Mean time of pregnancy awareness did not change linearly over a 23-year time period after adjustment for maternal age at the time of conception (p < 0.16). Conclusions for Practice On average, timing of pregnancy awareness did not change linearly during 1990-2012 among US women and occurs later among certain groups of women who are at higher risk of adverse birth outcomes.
Trends in Timing of Pregnancy Awareness Among US Women
2017-01-01
Objectives Early pregnancy detection is important for improving pregnancy outcomes as the first trimester is a critical window of fetal development; however, there has been no description of trends in timing of pregnancy awareness among US women. Methods We examined data from the 1995, 2002, 2006–2010 and 2011–2013 National Survey of Family Growth on self-reported timing of pregnancy awareness among women aged 15–44 years who reported at least one pregnancy in the 4 or 5 years prior to interview that did not result in induced abortion or adoption (n = 17, 406). We examined the associations between maternal characteristics and late pregnancy awareness (≥7 weeks’ gestation) using adjusted prevalence ratios from logistic regression models. Gestational age at time of pregnancy awareness (continuous) was regressed over year of pregnancy conception (1990–2012) in a linear model. Results Among all pregnancies reported, gestational age at time of pregnancy awareness was 5.5 weeks (standard error = 0.04) and the prevalence of late pregnancy awareness was 23 % (standard error = 1 %). Late pregnancy awareness decreased with maternal age, was more prevalent among non-Hispanic black and Hispanic women compared to non-Hispanic white women, and for unintended pregnancies versus those that were intended (p < 0.01). Mean time of pregnancy awareness did not change linearly over a 23-year time period after adjustment for maternal age at the time of conception (p < 0.16). Conclusions for Practice On average, timing of pregnancy awareness did not change linearly during 1990–2012 among US women and occurs later among certain groups of women who are at higher risk of adverse birth outcomes. PMID:27449777
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
2016-08-01
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
HOS network-based classification of power quality events via regression algorithms
NASA Astrophysics Data System (ADS)
Palomares Salas, José Carlos; González de la Rosa, Juan José; Sierra Fernández, José María; Pérez, Agustín Agüera
2015-12-01
This work compares seven regression algorithms implemented in artificial neural networks (ANNs) supported by 14 power-quality features, which are based in higher-order statistics. Combining time and frequency domain estimators to deal with non-stationary measurement sequences, the final goal of the system is the implementation in the future smart grid to guarantee compatibility between all equipment connected. The principal results are based in spectral kurtosis measurements, which easily adapt to the impulsive nature of the power quality events. These results verify that the proposed technique is capable of offering interesting results for power quality (PQ) disturbance classification. The best results are obtained using radial basis networks, generalized regression, and multilayer perceptron, mainly due to the non-linear nature of data.
Modeling of ultrasonic degradation of non-volatile organic compounds by Langmuir-type kinetics.
Chiha, Mahdi; Merouani, Slimane; Hamdaoui, Oualid; Baup, Stéphane; Gondrexon, Nicolas; Pétrier, Christian
2010-06-01
Sonochemical degradation of phenol (Ph), 4-isopropylphenol (4-IPP) and Rhodamine B (RhB) in aqueous solutions was investigated for a large range of initial concentrations in order to analyze the reaction kinetics. The initial rates of substrate degradation and H(2)O(2) formation as a function of initial concentrations were determined. The obtained results show that the degradation rate increases with increasing initial substrate concentration up to a plateau and that the sonolytic destruction occurs mainly through reactions with hydroxyl radicals in the interfacial region of cavitation bubbles. The rate of H(2)O(2) formation decreases with increasing substrate concentration and reaches a minimum, followed by almost constant production rate for higher substrate concentrations. Sonolytic degradation data were analyzed by the models of Okitsu et al. [K. Okitsu, K. Iwasaki, Y. Yobiko, H. Bandow, R. Nishimura, Y. Maeda, Sonochemical degradation of azo dyes in aqueous solution: a new heterogeneous kinetics model taking into account the local concentration OH radicals and azo dyes, Ultrason. Sonochem. 12 (2005) 255-262.] and Seprone et al. [N. Serpone, R. Terzian, H. Hidaka, E. Pelizzetti, Ultrasonic induced dehalogenation and oxidation of 2-, 3-, and 4-chlorophenol in air-equilibrated aqueous media. Similarities with irradiated semiconductor particulates, J. Phys. Chem. 98 (1994) 2634-2640.] developed on the basis of a Langmuir-type mechanism. The five linearized forms of the Okitsu et al.'s equation as well as the non-linear curve fitting analysis method were discussed. Results show that it is not appropriate to use the coefficient of determination of the linear regression method for comparing the best-fitting. Among the five linear expressions of the Okitsu et al.'s kinetic model, form-2 expression very well represent the degradation data for Ph and 4-IPP. Non-linear curve fitting analysis method was found to be the more appropriate method to determine the model parameters. An excellent representation of the experimental results of sonolytic destruction of RhB was obtained using the Serpone et al.'s model. The Serpone et al.'s model gives a worse fit for the sonolytic degradation data of Ph and 4-IPP. These results indicate that Ph and 4-IPP undergo degradation predominantly at the bubble/solution interface, whereas RhB undergoes degradation at both bubble/solution interface and in the bulk solution. (c) 2010 Elsevier B.V. All rights reserved.
Zhang, Y J; Wu, S L; Li, H Y; Zhao, Q H; Ning, C H; Zhang, R Y; Yu, J X; Li, W; Chen, S H; Gao, J S
2018-01-24
Objective: To investigate the impact of blood pressure and age on arterial stiffness in general population. Methods: Participants who took part in 2010, 2012 and 2014 Kailuan health examination were included. Data of brachial ankle pulse wave velocity (baPWV) examination were analyzed. According to the WHO criteria of age, participants were divided into 3 age groups: 18-44 years group ( n= 11 608), 45-59 years group ( n= 12 757), above 60 years group ( n= 5 002). Participants were further divided into hypertension group and non-hypertension group according to the diagnostic criteria for hypertension (2010 Chinese guidelines for the managemengt of hypertension). Multiple linear regression analysis was used to analyze the association between systolic blood pressure (SBP) with baPWV in the total participants and then stratified by age groups. Multivariate logistic regression model was used to analyze the influence of blood pressure on arterial stiffness (baPWV≥1 400 cm/s) of various groups. Results: (1)The baseline characteristics of all participants: 35 350 participants completed 2010, 2012 and 2014 Kailuan examinations and took part in baPWV examination. 2 237 participants without blood pressure measurement values were excluded, 1 569 participants with history of peripheral artery disease were excluded, we also excluded 1 016 participants with history of cardiac-cerebral vascular disease. Data from 29 367 participants were analyzed. The age was (48.0±12.4) years old, 21 305 were males (72.5%). (2) Distribution of baPWV in various age groups: baPWV increased with aging. In non-hypertension population, baPWV in 18-44 years group, 45-59 years group, above 60 years group were as follows: 1 299.3, 1 428.7 and 1 704.6 cm/s, respectively. For hypertension participants, the respective values of baPWV were: 1 498.4, 1 640.7 and 1 921.4 cm/s. BaPWV was significantly higher in hypertension group than non-hypertension group of respective age groups ( P< 0.05). (3) Multiple linear regression analysis defined risk factors of baPWV: Multivariate linear regression analysis showed that baPWV was positively correlated with SBP( t= 39.30, P< 0.001), and same results were found in the sub-age groups ( t -value was 37.72, 27.30, 9.15, all P< 0.001, respectively) after adjustment for other confounding factors, including age, sex, pulse pressure(PP), body mass index (BMI), fasting blood glucose (FBG), total cholesterol (TC), smoking, drinking, physical exercise, antihypertensive medications, lipid-lowering medication. (4) Multivariate logistic regression analysis of baPWV-related factors: After adjustment for other confounding factors, including age, sex, PP, BMI, FBG, TC, smoking, drinking, physical exercise, antihypertensive medication, lipid-lowering medication, multivariate logistic regression analysis showed that risks for increased arterial stiffness in hypertension group were higher than those in non-hypertension group, the OR in participants with hypertension was 2.54 (2.35-2.74) in the total participants, and same results were also found in sub-age groups, the OR s were 3.22(2.86-3.63), 2.48(2.23-2.76), and 1.91(1.42-2.56), respectively, in each sub-age group. Conclusion: SBP is positively related to arterial stiffness in different age groups, and hypertension is a risk factor for increased arterial stiffness in different age groups. Clinical Trial Registry Chinese Clinical Trial Registry, ChiCTR-TNC-11001489.
Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis
ERIC Educational Resources Information Center
Luo, Wen; Azen, Razia
2013-01-01
Dominance analysis (DA) is a method used to evaluate the relative importance of predictors that was originally proposed for linear regression models. This article proposes an extension of DA that allows researchers to determine the relative importance of predictors in hierarchical linear models (HLM). Commonly used measures of model adequacy in…
Pease, J M; Morselli, M F
1987-01-01
This paper deals with a computer program adapted to a statistical method for analyzing an unlimited quantity of binary recorded data of an independent circular variable (e.g. wind direction), and a linear variable (e.g. maple sap flow volume). Circular variables cannot be statistically analyzed with linear methods, unless they have been transformed. The program calculates a critical quantity, the acrophase angle (PHI, phi o). The technique is adapted from original mathematics [1] and is written in Fortran 77 for easier conversion between computer networks. Correlation analysis can be performed following the program or regression which, because of the circular nature of the independent variable, becomes periodic regression. The technique was tested on a file of approximately 4050 data pairs.
NASA Astrophysics Data System (ADS)
Zia, Haider
2017-06-01
This paper describes an updated exponential Fourier based split-step method that can be applied to a greater class of partial differential equations than previous methods would allow. These equations arise in physics and engineering, a notable example being the generalized derivative non-linear Schrödinger equation that arises in non-linear optics with self-steepening terms. These differential equations feature terms that were previously inaccessible to model accurately with low computational resources. The new method maintains a 3rd order error even with these additional terms and models the equation in all three spatial dimensions and time. The class of non-linear differential equations that this method applies to is shown. The method is fully derived and implementation of the method in the split-step architecture is shown. This paper lays the mathematical ground work for an upcoming paper employing this method in white-light generation simulations in bulk material.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
NASA Astrophysics Data System (ADS)
Di Giacomo, Domenico; Bondár, István; Storchak, Dmitry A.; Engdahl, E. Robert; Bormann, Peter; Harris, James
2015-02-01
This paper outlines the re-computation and compilation of the magnitudes now contained in the final ISC-GEM Reference Global Instrumental Earthquake Catalogue (1900-2009). The catalogue is available via the ISC website (http://www.isc.ac.uk/iscgem/). The available re-computed MS and mb provided an ideal basis for deriving new conversion relationships to moment magnitude MW. Therefore, rather than using previously published regression models, we derived new empirical relationships using both generalized orthogonal linear and exponential non-linear models to obtain MW proxies from MS and mb. The new models were tested against true values of MW, and the newly derived exponential models were then preferred to the linear ones in computing MW proxies. For the final magnitude composition of the ISC-GEM catalogue, we preferred directly measured MW values as published by the Global CMT project for the period 1976-2009 (plus intermediate-depth earthquakes between 1962 and 1975). In addition, over 1000 publications have been examined to obtain direct seismic moment M0 and, therefore, also MW estimates for 967 large earthquakes during 1900-1978 (Lee and Engdahl, 2015) by various alternative methods to the current GCMT procedure. In all other instances we computed MW proxy values by converting our re-computed MS and mb values into MW, using the newly derived non-linear regression models. The final magnitude composition is an improvement in terms of magnitude homogeneity compared to previous catalogues. The magnitude completeness is not homogeneous over the 110 years covered by the ISC-GEM catalogue. Therefore, seismicity rate estimates may be strongly affected without a careful time window selection. In particular, the ISC-GEM catalogue appears to be complete down to MW 5.6 starting from 1964, whereas for the early instrumental period the completeness varies from ∼7.5 to 6.2. Further time and resources would be necessary to homogenize the magnitude of completeness over the entire catalogue length.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Evaluation of generalized degrees of freedom for sparse estimation by replica method
NASA Astrophysics Data System (ADS)
Sakata, A.
2016-12-01
We develop a method to evaluate the generalized degrees of freedom (GDF) for linear regression with sparse regularization. The GDF is a key factor in model selection, and thus its evaluation is useful in many modelling applications. An analytical expression for the GDF is derived using the replica method in the large-system-size limit with random Gaussian predictors. The resulting formula has a universal form that is independent of the type of regularization, providing us with a simple interpretation. Within the framework of replica symmetric (RS) analysis, GDF has a physical meaning as the effective fraction of non-zero components. The validity of our method in the RS phase is supported by the consistency of our results with previous mathematical results. The analytical results in the RS phase are calculated numerically using the belief propagation algorithm.
A simple method for identifying parameter correlations in partially observed linear dynamic models.
Li, Pu; Vu, Quoc Dong
2015-12-14
Parameter estimation represents one of the most significant challenges in systems biology. This is because biological models commonly contain a large number of parameters among which there may be functional interrelationships, thus leading to the problem of non-identifiability. Although identifiability analysis has been extensively studied by analytical as well as numerical approaches, systematic methods for remedying practically non-identifiable models have rarely been investigated. We propose a simple method for identifying pairwise correlations and higher order interrelationships of parameters in partially observed linear dynamic models. This is made by derivation of the output sensitivity matrix and analysis of the linear dependencies of its columns. Consequently, analytical relations between the identifiability of the model parameters and the initial conditions as well as the input functions can be achieved. In the case of structural non-identifiability, identifiable combinations can be obtained by solving the resulting homogenous linear equations. In the case of practical non-identifiability, experiment conditions (i.e. initial condition and constant control signals) can be provided which are necessary for remedying the non-identifiability and unique parameter estimation. It is noted that the approach does not consider noisy data. In this way, the practical non-identifiability issue, which is popular for linear biological models, can be remedied. Several linear compartment models including an insulin receptor dynamics model are taken to illustrate the application of the proposed approach. Both structural and practical identifiability of partially observed linear dynamic models can be clarified by the proposed method. The result of this method provides important information for experimental design to remedy the practical non-identifiability if applicable. The derivation of the method is straightforward and thus the algorithm can be easily implemented into a software packet.
2012-01-01
Background A link between low parental socioeconomic status and mental health problems in offspring is well established in previous research. The mechanisms that explain this link are largely unknown. The present study investigated whether school performance was a mediating and/or moderating factor in the path between parental socioeconomic status and the risk of hospital admission for non-fatal suicidal behaviour. Methods A national cohort of 447 929 children born during 1973-1977 was followed prospectively in the National Patient Discharge Register from the end of their ninth and final year of compulsory school until 2001. Multivariate Cox proportional hazards and linear regression analyses were performed to test whether the association between parental socioeconomic status and non-fatal suicidal behaviour was mediated or moderated by school performance. Results The results of a series of multiple regression analyses, adjusted for demographic variables, revealed that school performance was as an important mediator in the relationship between parental socioeconomic status and risk of non-fatal suicidal behaviour, accounting for 60% of the variance. The hypothesized moderation of parental socioeconomic status-non-fatal suicidal behaviour relationship by school performance was not supported. Conclusions School performance is an important mediator through which parental socioeconomic status translates into a risk for non-fatal suicidal behaviour. Prevention efforts aimed to reduce socioeconomic inequalities in non-fatal suicidal behaviour among young people will need to consider socioeconomic inequalities in school performance. PMID:22230577
Non-Linear Concentration-Response Relationships between Ambient Ozone and Daily Mortality.
Bae, Sanghyuk; Lim, Youn-Hee; Kashima, Saori; Yorifuji, Takashi; Honda, Yasushi; Kim, Ho; Hong, Yun-Chul
2015-01-01
Ambient ozone (O3) concentration has been reported to be significantly associated with mortality. However, linearity of the relationships and the presence of a threshold has been controversial. The aim of the present study was to examine the concentration-response relationship and threshold of the association between ambient O3 concentration and non-accidental mortality in 13 Japanese and Korean cities from 2000 to 2009. We selected Japanese and Korean cities which have population of over 1 million. We constructed Poisson regression models adjusting daily mean temperature, daily mean PM10, humidity, time trend, season, year, day of the week, holidays and yearly population. The association between O3 concentration and mortality was examined using linear, spline and linear-threshold models. The thresholds were estimated for each city, by constructing linear-threshold models. We also examined the city-combined association using a generalized additive mixed model. The mean O3 concentration did not differ greatly between Korea and Japan, which were 26.2 ppb and 24.2 ppb, respectively. Seven out of 13 cities showed better fits for the spline model compared with the linear model, supporting a non-linear relationships between O3 concentration and mortality. All of the 7 cities showed J or U shaped associations suggesting the existence of thresholds. The range of city-specific thresholds was from 11 to 34 ppb. The city-combined analysis also showed a non-linear association with a threshold around 30-40 ppb. We have observed non-linear concentration-response relationship with thresholds between daily mean ambient O3 concentration and daily number of non-accidental death in Japanese and Korean cities.
Consumption of non-cow's milk beverages and serum vitamin D levels in early childhood.
Lee, Grace J; Birken, Catherine S; Parkin, Patricia C; Lebovic, Gerald; Chen, Yang; L'Abbé, Mary R; Maguire, Jonathon L
2014-11-18
Vitamin D fortification of non-cow's milk beverages is voluntary in North America. The effect of consuming non-cow's milk beverages on serum 25-hydroxyvitamin D levels in children is unclear. We studied the association between non-cow's milk consumption and 25-hydroxyvitamin D levels in healthy preschool-aged children. We also explored whether cow's milk consumption modified this association and analyzed the association between daily non-cow's milk and cow's milk consumption. In this cross-sectional study, we recruited children 1-6 years of age attending routinely scheduled well-child visits. Survey responses, and anthropometric and laboratory measurements were collected. The association between non-cow's milk consumption and 25-hydroxyvitamin D levels was tested using multiple linear regression and logistic regression. Cow's milk consumption was explored as an effect modifier using an interaction term. The association between daily intake of non-cow's milk and cow's milk was explored using multiple linear regression. A total of 2831 children were included. The interaction between non-cow's milk and cow's milk consumption was statistically significant (p = 0.03). Drinking non-cow's milk beverages was associated with a 4.2-nmol/L decrease in 25-hydroxyvitamin D level per 250-mL cup consumed among children who also drank cow's milk (p = 0.008). Children who drank only non-cow's milk were at higher risk of having a 25-hydroxyvitamin D level below 50 nmol/L than children who drank only cow's milk (odds ratio 2.7, 95% confidence interval 1.6 to 4.7). Consumption of non-cow's milk beverages was associated with decreased serum 25-hydroxyvitamin D levels in early childhood. This association was modified by cow's milk consumption, which suggests a trade-off between consumption of cow's milk fortified with higher levels of vitamin D and non-cow's milk with lower vitamin D content. © 2014 Canadian Medical Association or its licensors.
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Bertrand-Krajewski, J L
2004-01-01
In order to replace traditional sampling and analysis techniques, turbidimeters can be used to estimate TSS concentration in sewers, by means of sensor and site specific empirical equations established by linear regression of on-site turbidity Tvalues with TSS concentrations C measured in corresponding samples. As the ordinary least-squares method is not able to account for measurement uncertainties in both T and C variables, an appropriate regression method is used to solve this difficulty and to evaluate correctly the uncertainty in TSS concentrations estimated from measured turbidity. The regression method is described, including detailed calculations of variances and covariance in the regression parameters. An example of application is given for a calibrated turbidimeter used in a combined sewer system, with data collected during three dry weather days. In order to show how the established regression could be used, an independent 24 hours long dry weather turbidity data series recorded at 2 min time interval is used, transformed into estimated TSS concentrations, and compared to TSS concentrations measured in samples. The comparison appears as satisfactory and suggests that turbidity measurements could replace traditional samples. Further developments, including wet weather periods and other types of sensors, are suggested.
Yin, Xiaoxv; Yan, Shijiao; Tong, Yeqing; Peng, Xin; Yang, Tingting; Lu, Zuxun; Gong, Yanhong
2018-02-01
Tuberculosis (TB) poses a significant challenge to public health worldwide. Stigma is a major obstacle to TB control by leading to delay in diagnosis and treatment non-adherence. This study aimed to evaluate the status of TB-related stigma and its associated factors among TB patients in China. Cross-sectional survey. Thus, 1342 TB patients were recruited from TB dispensaries in three counties in Hubei Province using a multistage sampling method and surveyed using a structured anonymous questionnaire including validated scales to measure TB-related stigma. A generalised linear regression model was used to identify the factors associated with TB-related stigma. The average score on the TB-related Stigma Scale was 9.33 (SD = 4.25). Generalised linear regression analysis revealed that knowledge about TB (ß = -0.18, P = 0.0025), family function (ß = -0.29, P < 0.0001) and doctor-patient communication (ß = -0.32, P = 0.0005) were negatively associated with TB-related stigma. TB-related stigma was high among TB patients in China. Interventions concentrating on reducing TB patients' stigma in China should focus on improving patients' family function and patients' knowledge about TB. © 2017 John Wiley & Sons Ltd.
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
NASA Astrophysics Data System (ADS)
Anick, David J.
2003-12-01
A method is described for a rapid prediction of B3LYP-optimized geometries for polyhedral water clusters (PWCs). Starting with a database of 121 B3LYP-optimized PWCs containing 2277 H-bonds, linear regressions yield formulas correlating O-O distances, O-O-O angles, and H-O-H orientation parameters, with local and global cluster descriptors. The formulas predict O-O distances with a rms error of 0.85 pm to 1.29 pm and predict O-O-O angles with a rms error of 0.6° to 2.2°. An algorithm is given which uses the O-O and O-O-O formulas to determine coordinates for the oxygen nuclei of a PWC. The H-O-H formulas then determine positions for two H's at each O. For 15 test clusters, the gap between the electronic energy of the predicted geometry and the true B3LYP optimum ranges from 0.11 to 0.54 kcal/mol or 4 to 18 cal/mol per H-bond. Linear regression also identifies 14 parameters that strongly correlate with PWC electronic energy. These descriptors include the number of H-bonds in which both oxygens carry a non-H-bonding H, the number of quadrilateral faces, the number of symmetric angles in 5- and in 6-sided faces, and the square of the cluster's estimated dipole moment.
High-resolution proxies for wood density variations in Terminalia superba
De Ridder, Maaike; Van den Bulcke, Jan; Vansteenkiste, Dries; Van Loo, Denis; Dierick, Manuel; Masschaele, Bert; De Witte, Yoni; Mannes, David; Lehmann, Eberhard; Beeckman, Hans; Van Hoorebeke, Luc; Van Acker, Joris
2011-01-01
Background and Aims Density is a crucial variable in forest and wood science and is evaluated by a multitude of methods. Direct gravimetric methods are mostly destructive and time-consuming. Therefore, faster and semi- to non-destructive indirect methods have been developed. Methods Profiles of wood density variations with a resolution of approx. 50 µm were derived from one-dimensional resistance drillings, two-dimensional neutron scans, and three-dimensional neutron and X-ray scans. All methods were applied on Terminalia superba Engl. & Diels, an African pioneer species which sometimes exhibits a brown heart (limba noir). Key Results The use of X-ray tomography combined with a reference material permitted direct estimates of wood density. These X-ray-derived densities overestimated gravimetrically determined densities non-significantly and showed high correlation (linear regression, R2 = 0·995). When comparing X-ray densities with the attenuation coefficients of neutron scans and the amplitude of drilling resistance, a significant linear relation was found with the neutron attenuation coefficient (R2 = 0·986) yet a weak relation with drilling resistance (R2 = 0·243). When density patterns are compared, all three methods are capable of revealing the same trends. Differences are mainly due to the orientation of tree rings and the different characteristics of the indirect methods. Conclusions High-resolution X-ray computed tomography is a promising technique for research on wood cores and will be explored further on other temperate and tropical species. Further study on limba noir is necessary to reveal the causes of density variations and to determine how resistance drillings can be further refined. PMID:21131386
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
DOA Finding with Support Vector Regression Based Forward-Backward Linear Prediction.
Pan, Jingjing; Wang, Yide; Le Bastard, Cédric; Wang, Tianzhen
2017-05-27
Direction-of-arrival (DOA) estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward-backward linear prediction (FBLP) is able to directly deal with coherent signals. Support vector regression (SVR) is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs). Simulation results show the effectiveness of the proposed method.
Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2018-04-01
In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.
Samokhvalov, Andriy V.; Rehm, Jürgen; Roerecke, Michael
2015-01-01
Background Pancreatitis is a highly prevalent medical condition associated with a spectrum of endocrine and exocrine pancreatic insufficiencies. While high alcohol consumption is an established risk factor for pancreatitis, its relationship with specific types of pancreatitis and a potential threshold have not been systematically examined. Methods We conducted a systematic literature search for studies on the association between alcohol consumption and pancreatitis based on PRISMA guidelines. Non-linear and linear random-effect dose–response meta-analyses using restricted cubic spline meta-regressions and categorical meta-analyses in relation to abstainers were conducted. Findings Seven studies with 157,026 participants and 3618 cases of pancreatitis were included into analyses. The dose–response relationship between average volume of alcohol consumption and risk of pancreatitis was monotonic with no evidence of non-linearity for chronic pancreatitis (CP) for both sexes (p = 0.091) and acute pancreatitis (AP) in men (p = 0.396); it was non-linear for AP in women (p = 0.008). Compared to abstention, there was a significant decrease in risk (RR = 0.76, 95%CI: 0.60–0.97) of AP in women below the threshold of 40 g/day. No such association was found in men (RR = 1.1, 95%CI: 0.69–1.74). The RR for CP at 100 g/day was 6.29 (95%CI: 3.04–13.02). Interpretation The dose–response relationships between alcohol consumption and risk of pancreatitis were monotonic for CP and AP in men, and non-linear for AP in women. Alcohol consumption below 40 g/day was associated with reduced risk of AP in women. Alcohol consumption beyond this level was increasingly detrimental for any type of pancreatitis. Funding The work was financially supported by a grant from the National Institute on Alcohol Abuse and Alcoholism (R21AA023521) to the last author. PMID:26844279
On the equivalence of case-crossover and time series methods in environmental epidemiology.
Lu, Yun; Zeger, Scott L
2007-04-01
The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.
NASA Astrophysics Data System (ADS)
Shao-Qin, Lin; Xuan, Lin; Shi-Rong, Hu; Li-Qing, Zeng; Yan, Wang; Li, Chen; Jia-Ming, Liu; Long-Di, Li
2005-11-01
A new method for the determination of trace aluminum has been proposed. It is based on the fact that alizarin red can emit strong and stable fluorescence at 80 °C for 30 min and Al 3+ can effectively catalyze potassium chlorate oxidizing alizarin red to form non-fluorescence complex which cause the fluorescence quenching. The linear dynamic range of this method is 0.040-4.00 ng l -1 with a detection limit of 5.3 pg l -1. The regression equation can be expressed as Δ If = 8.731 + 21.73 c (ng l -1), with the correlation coefficient r = 0.9992 ( n = 6). This sensitive, rapid and accurate method has been applied to the determination of trace aluminum(III) in human hair and tea samples successfully. What is more, the mechanism of catalyzing potassium chlorate oxidizing alizarin red by the fluorescence quenching method is also discussed.
Shao-Qin, Lin; Xuan, Lin; Shi-Rong, Hu; Li-Qing, Zeng; Yan, Wang; Li, Chen; Jia-Ming, Liu; Long-Di, Li
2005-11-01
A new method for the determination of trace aluminum has been proposed. It is based on the fact that alizarin red can emit strong and stable fluorescence at 80 degrees C for 30 min and Al(3+) can effectively catalyze potassium chlorate oxidizing alizarin red to form non-fluorescence complex which cause the fluorescence quenching. The linear dynamic range of this method is 0.040-4.00 ngl(-1) with a detection limit of 5.3 pgl(-1). The regression equation can be expressed as DeltaI(f)=8.731+21.73c(Al(3+)) (ngl(-1)), with the correlation coefficient r=0.9992 (n=6). This sensitive, rapid and accurate method has been applied to the determination of trace aluminum(III) in human hair and tea samples successfully. What is more, the mechanism of catalyzing potassium chlorate oxidizing alizarin red by the fluorescence quenching method is also discussed.
Hayashi, Tatsuya; Saitoh, Satoshi; Takahashi, Junji; Tsuji, Yoshinori; Ikeda, Kenji; Kobayashi, Masahiro; Kawamura, Yusuke; Fujii, Takeshi; Inoue, Masafumi; Miyati, Tosiaki; Kumada, Hiromitsu
2017-04-01
The two-point Dixon method for magnetic resonance imaging (MRI) is commonly used to non-invasively measure fat deposition in the liver. The aim of the present study was to assess the usefulness of MRI-fat fraction (MRI-FF) using the two-point Dixon method based on the non-alcoholic fatty liver disease activity score. This retrospective study included 106 patients who underwent liver MRI and MR spectroscopy, and 201 patients who underwent liver MRI and histological assessment. The relationship between MRI-FF and MR spectroscopy-fat fraction was used to estimate the corrected MRI-FF for hepatic multi-peaks of fat. Then, a color FF map was generated with the corrected MRI-FF based on the non-alcoholic fatty liver disease activity score. We defined FF variability as the standard deviation of FF in regions of interest. Uniformity of hepatic fat was visually graded on a three-point scale using both gray-scale and color FF maps. Confounding effects of histology (iron, inflammation and fibrosis) on corrected MRI-FF were assessed by multiple linear regression. The linear correlations between MRI-FF and MR spectroscopy-fat fraction, and between corrected MRI-FF and histological steatosis were strong (R 2 = 0.90 and R 2 = 0.88, respectively). Liver fat variability significantly increased with visual fat uniformity grade using both of the maps (ρ = 0.67-0.69, both P < 0.001). Hepatic iron, inflammation and fibrosis had no significant confounding effects on the corrected MRI-FF (all P > 0.05). The two-point Dixon method and the gray-scale or color FF maps based on the non-alcoholic fatty liver disease activity score were useful for fat quantification in the liver of patients without severe iron deposition. © 2016 The Japan Society of Hepatology.
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Biostatistics Series Module 6: Correlation and Linear Regression
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175
NASA Astrophysics Data System (ADS)
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
NASA Astrophysics Data System (ADS)
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-03-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-01-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
Cámara, María S; Ferroni, Félix M; De Zan, Mercedes; Goicoechea, Héctor C
2003-07-01
An improvement is presented on the simultaneous determination of two active ingredients present in unequal concentrations in injections. The analysis was carried out with spectrophotometric data and non-linear multivariate calibration methods, in particular artificial neural networks (ANNs). The presence of non-linearities caused by the major analyte concentrations which deviate from Beer's law was confirmed by plotting actual vs. predicted concentrations, and observing curvatures in the residuals for the estimated concentrations with linear methods. Mixtures of dextropropoxyphene and dipyrone have been analysed by using linear and non-linear partial least-squares (PLS and NPLSs) and ANNs. Notwithstanding the high degree of spectral overlap and the occurrence of non-linearities, rapid and simultaneous analysis has been achieved, with reasonably good accuracy and precision. A commercial sample was analysed by using the present methodology, and the obtained results show reasonably good agreement with those obtained by using high-performance liquid chromatography (HPLC) and a UV-spectrophotometric comparative methods.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
Linear regression models and k-means clustering for statistical analysis of fNIRS data
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-01-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751
Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello
2016-05-01
Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].
Vanegas, Jairo; Vásquez, Fabián
Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Naimi, Ashley I; Cole, Stephen R; Kennedy, Edward H
2017-04-01
Robins' generalized methods (g methods) provide consistent estimates of contrasts (e.g. differences, ratios) of potential outcomes under a less restrictive set of identification conditions than do standard regression methods (e.g. linear, logistic, Cox regression). Uptake of g methods by epidemiologists has been hampered by limitations in understanding both conceptual and technical details. We present a simple worked example that illustrates basic concepts, while minimizing technical complications. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2015-11-18
Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.
NASA Astrophysics Data System (ADS)
de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia
Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.
Mohammadi Majd, Tahereh; Kalantari, Shiva; Raeisi Shahraki, Hadi; Nafar, Mohsen; Almasi, Afshin; Samavat, Shiva; Parvin, Mahmoud; Hashemian, Amirhossein
2018-03-10
IgA nephropathy (IgAN) is the most common primary glomerulonephritis diagnosed based on renal biopsy. Mesangial IgA deposits along with the proliferation of mesangial cells are the histologic hallmark of IgAN. Non-invasive diagnostic tools may help to prompt diagnosis and therapy. The discovery of potential and reliable urinary biomarkers for diagnosis of IgAN depends on applying robust and suitable models. Applying two multivariate modeling methods on a urine proteomic dataset obtained from IgAN patients, and comparison of the results of these methods were the purpose of this study. Two models were constructed for urinary protein profiles of 13 patients and 8 healthy individuals, based on sparse linear discriminant analysis (SLDA) and elastic net regression methods. A panel of selected biomarkers with the best coefficients were proposed and further analyzed for biological relevance using functional annotation and pathway analysis. Transferrin, α1-antitrypsin, and albumin fragments were the most important up-regulated biomarkers, while fibulin-5, YIP1 family member 3, prasoposin, and osteopontin were the most important down-regulated biomarkers. Pathway analysis revealed that complement and coagulation cascades and extracellular matrix-receptor interaction pathways impaired in the pathogenesis of IgAN. SLDA and elastic net had an equal importance for diagnosis of IgAN and were useful methods for exploring and processing proteomic data. In addition, the suggested biomarkers are reliable candidates for further validation to non-invasive diagnose of IgAN based on urine examination.
Modelling daily water temperature from air temperature for the Missouri River.
Zhu, Senlin; Nyarko, Emmanuel Karlo; Hadzima-Nyarko, Marijana
2018-01-01
The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air-water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.
NASA Astrophysics Data System (ADS)
Rijal, Santosh
Various military training activities are conducted in more than 11.3 million hectares of land (> 5,500 training sites) in the United States (U.S.). These training activities directly and indirectly degrade the land. Land degradation can impede continuous military training. In order to sustain long term training missions and Army combat readiness, the environmental conditions of the military installations need to be carefully monitored and assessed. Furthermore, the National Environmental Policy Act of 1969 (NEPA) and the U.S. Army Regulation 200-2 require the DoD to minimize the environmental impacts of training and document the environmental consequences of their actions. To achieve these objectives, the Department of Army initiated an Integrated Training Area Management (ITAM) program to manage training lands through assessing their environmental requirements and establishing policies and procedures to achieve optimum, sustainable use of training lands. One of the programs under ITAM, Range and Training Land Assessment (RTLA) was established to collect field-based data for monitoring installation's environmental condition. Due to high cost and inefficiencies involved in the collection of field data, the RTLA program was stopped in several military installations. Therefore, there has been a strong need to develop an efficient and low cost remote sensing based methodology for assessing and monitoring land conditions of military installations. It is also important to make a long-term assessment of installation land condition for understanding cumulative impacts of continuous military training activities. Additionally, it is unclear that compared to non-military land condition, to what extent military training activities have led to the degradation of land condition for military installations. The first paper of this dissertation developed a soil erosion relevant and image derived cover factor (ICF) based on linear spectral mixture (LSM) analysis to assess and monitor the land condition of military land and compare it with non-military land. The results from this study can provide FR land managers with the information of the spatial variation and temporal trend of land condition in FR. Fort Riley land managers can also use this method for monitoring their land condition at a very low cost. This method can thus be applied to other military installations as well as non-military lands. Furthermore, one of the most significant environmental problems in military installations of the U.S. is the formation of gullies due to the intensive use of military vehicle. However, to our knowledge, no remote sensing based method has been developed and used to assess the detection of gullies in military installations. In the second paper of this dissertation, light detection and ranging (LiDAR) derived digital elevation model (DEM) of 2010 and WorldView-2 images of 2010 were used to quantify the gullies in FR. This method can be easily applied to assess gullies in non-military installations. On the other hand, modeling the land condition of military installation is critical to understand the spatial and temporal pattern of military training induced disturbance and land recovery. In the third paper, it was assumed that the military training induced disturbance was spatially auto-correlated and thus four regression models including i) linear stepwise regression (LSR) ii) logistic regression (LR), iii) geographically weighted linear regression (GWR), and iv) geographically weighted logistic regression (GWLR) were developed and compared using remote sensing image derived spectral variables for years 1990, 1997, 1998, 1999, and 2001. It was found that the spatial distribution of the military training disturbance was well demonstrated by all the regression models with higher intensities of military training disturbance in the northwest and central west parts of the installation. Compared to other regression models, GWR accurately estimated the land condition of FR. This result provided the applicability of using local variability based regression model to accurately predict land condition. Different plant communities of military installations respond differently to military training induced disturbance. The information of the spatial distribution of plant species in military installations is important to gain insight of the resilient capacity of the land following disturbances. For the purpose, in the fourth paper, hyperspectral in-situ data were collected from FR and KPBS in the summer of 2015 using a hyperspectral instrument. Principal component analysis (PCA) and band relative importance (BRI) were used to identify relative importance of each of the spectral bands. The results from this study provided useful information about the optimal wavelengths that help distinguish different plant species of FR and can be easily used with high resolution hyperspectral images for mapping the spatial distribution of the plant species. This information will be helpful for the sustainable management of the tallgrass prairie ecosystem. (Abstract shortened by ProQuest.).
Application of Linear and Non-Linear Harmonic Methods for Unsteady Transonic Flow
NASA Astrophysics Data System (ADS)
Gundevia, Rayomand
This thesis explores linear and non-linear computational methods for solving unsteady flow. The eventual goal is to apply these methods to two-dimensional and three-dimensional flutter predictions. In this study the quasi-one-dimensional nozzle is used as a framework for understanding these methods and their limitations. Subsonic and transonic cases are explored as the back-pressure is forced to oscillate with known amplitude and frequency. A steady harmonic approach is used to solve this unsteady problem for which perturbations are said to be small in comparison to the mean flow. The use of a linearized Euler equations (LEE) scheme is good at capturing the flow characteristics but is limited by accuracy to relatively small amplitude perturbations. The introduction of time-averaged second-order terms in the Non-Linear Harmonic (NLH) method means that a better approximation of the mean-valued solution, upon which the linearization is based, can be made. The nonlinear time-accurate Euler solutions are used for comparison and to establish the regimes of unsteadiness for which these schemes fails. The usefulness of the LEE and NLH methods lie in the gains in computational efficiency over the full equations.
Non-linear vibrations of sandwich viscoelastic shells
NASA Astrophysics Data System (ADS)
Benchouaf, Lahcen; Boutyour, El Hassan; Daya, El Mostafa; Potier-Ferry, Michel
2018-04-01
This paper deals with the non-linear vibration of sandwich viscoelastic shell structures. Coupling a harmonic balance method with the Galerkin's procedure, one obtains an amplitude equation depending on two complex coefficients. The latter are determined by solving a classical eigenvalue problem and two linear ones. This permits to get the non-linear frequency and the non-linear loss factor as functions of the displacement amplitude. To validate our approach, these relationships are illustrated in the case of a circular sandwich ring.
Pang, Haowen; Sun, Xiaoyang; Yang, Bo; Wu, Jingbo
2018-05-01
To ensure good quality intensity-modulated radiation therapy (IMRT) planning, we proposed the use of a quality control method based on generalized equivalent uniform dose (gEUD) that predicts absorbed radiation doses in organs at risk (OAR). We conducted a retrospective analysis of patients who underwent IMRT for the treatment of cervical carcinoma, nasopharyngeal carcinoma (NPC), or non-small cell lung cancer (NSCLC). IMRT plans were randomly divided into data acquisition and data verification groups. OAR in the data acquisition group for cervical carcinoma and NPC were further classified as sub-organs at risk (sOAR). The normalized volume of sOAR and normalized gEUD (a = 1) were analyzed using multiple linear regression to establish a fitting formula. For NSCLC, the normalized intersection volume of the planning target volume (PTV) and lung, the maximum diameter of the PTV (left-right, anterior-posterior, and superior-inferior), and the normalized gEUD (a = 1) were analyzed using multiple linear regression to establish a fitting formula for the lung gEUD (a = 1). The r-squared and P values indicated that the fitting formula was a good fit. In the data verification group, IMRT plans verified the accuracy of the fitting formula, and compared the gEUD (a = 1) for each OAR between the subjective method and the gEUD-based method. In conclusion, the gEUD-based method can be used effectively for quality control and can reduce the influence of subjective factors on IMRT planning optimization. © 2018 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Darsazan, Bahar; Shafaati, Alireza; Mortazavi, Seyed Alireza; Zarghi, Afshin
2017-01-01
A simple and reliable stability-indicating RP-HPLC method was developed and validated for analysis of adefovir dipivoxil (ADV).The chromatographic separation was performed on a C 18 column using a mixture of acetonitrile-citrate buffer (10 mM at pH 5.2) 36:64 (%v/v) as mobile phase, at a flow rate of 1.5 mL/min. Detection was carried out at 260 nm and a sharp peak was obtained for ADV at a retention time of 5.8 ± 0.01 min. No interferences were observed from its stress degradation products. The method was validated according to the international guidelines. Linear regression analysis of data for the calibration plot showed a linear relationship between peak area and concentration over the range of 0.5-16 μg/mL; the regression coefficient was 0.9999and the linear regression equation was y = 24844x-2941.3. The detection (LOD) and quantification (LOQ) limits were 0.12 and 0.35 μg/mL, respectively. The results proved the method was fast (analysis time less than 7 min), precise, reproducible, and accurate for analysis of ADV over a wide range of concentration. The proposed specific method was used for routine quantification of ADV in pharmaceutical bulk and a tablet dosage form.
Buchvold, Hogne Vikanes; Pallesen, Ståle; Waage, Siri; Bjorvatn, Bjørn
2018-05-01
Objectives The aim of this study was to investigate changes in body mass index (BMI) between different work schedules and different average number of yearly night shifts over a four-year follow-up period. Methods A prospective study of Norwegian nurses (N=2965) with different work schedules was conducted: day only, two-shift rotation (day and evening shifts), three-shift rotation (day, evening and night shifts), night only, those who changed towards night shifts, and those who changed away from schedules containing night shifts. Paired student's t-tests were used to evaluate within subgroup changes in BMI. Multiple linear regression analysis was used to evaluate between groups effects on BMI when adjusting for BMI at baseline, sex, age, marital status, children living at home, and years since graduation. The same regression model was used to evaluate the effect of average number of yearly night shifts on BMI change. Results We found that night workers [mean difference (MD) 1.30 (95% CI 0.70-1.90)], two shift workers [MD 0.48 (95% CI 0.20-0.75)], three shift workers [MD 0.46 (95% CI 0.30-0.62)], and those who changed work schedule away from [MD 0.57 (95% CI 0.17-0.84)] or towards night work [MD 0.63 (95% CI 0.20-1.05)] all had significant BMI gain (P<0.01) during the follow-up period. However, day workers had a non-significant BMI gain. Using adjusted multiple linear regressions, we found that night workers had significantly larger BMI gain compared to day workers [B=0.89 (95% CI 0.06-1.72), P<0.05]. We did not find any significant association between average number of yearly night shifts and BMI change using our multiple linear regression model. Conclusions After adjusting for possible confounders, we found that BMI increased significantly more among night workers compared to day workers.
SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories
NASA Astrophysics Data System (ADS)
Zhang, M.; Collioud, A.; Charlot, P.
2018-02-01
We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.
Designing pinhole vacancies in graphene towards functionalization: Effects on critical buckling load
NASA Astrophysics Data System (ADS)
Georgantzinos, S. K.; Markolefas, S.; Giannopoulos, G. I.; Katsareas, D. E.; Anifantis, N. K.
2017-03-01
The effect of size and placement of pinhole-type atom vacancies on Euler's critical load on free-standing, monolayer graphene, is investigated. The graphene is modeled by a structural spring-based finite element approach, in which every interatomic interaction is approached as a linear spring. The geometry of graphene and the pinhole size lead to the assembly of the stiffness matrix of the nanostructure. Definition of the boundary conditions of the problem leads to the solution of the eigenvalue problem and consequently to the critical buckling load. Comparison to results found in the literature illustrates the validity and accuracy of the proposed method. Parametric analysis regarding the placement and size of the pinhole-type vacancy, as well as the graphene geometry, depicts the effects on critical buckling load. Non-linear regression analysis leads to empirical-analytical equations for predicting the buckling behavior of graphene, with engineered pinhole-type atom vacancies.
Dafsari, Haidar Salimi; Weiß, Luisa; Silverdale, Monty; Rizos, Alexandra; Reddy, Prashanth; Ashkan, Keyoumars; Evans, Julian; Reker, Paul; Petry-Schmelzer, Jan Niklas; Samuel, Michael; Visser-Vandewalle, Veerle; Antonini, Angelo; Martinez-Martin, Pablo; Ray-Chaudhuri, K; Timmermann, Lars
2018-02-24
Subthalamic nucleus (STN) deep brain stimulation (DBS) improves quality of life (QoL), motor, and non-motor symptoms (NMS) in advanced Parkinson's disease (PD). However, considerable inter-individual variability has been observed for QoL outcome. We hypothesized that demographic and preoperative NMS characteristics can predict postoperative QoL outcome. In this ongoing, prospective, multicenter study (Cologne, Manchester, London) including 88 patients, we collected the following scales preoperatively and on follow-up 6 months postoperatively: PDQuestionnaire-8 (PDQ-8), NMSScale (NMSS), NMSQuestionnaire (NMSQ), Scales for Outcomes in PD (SCOPA)-motor examination, -complications, and -activities of daily living, levodopa equivalent daily dose. We dichotomized patients into "QoL responders"/"non-responders" and screened for factors associated with QoL improvement with (1) Spearman-correlations between baseline test scores and QoL improvement, (2) step-wise linear regressions with baseline test scores as independent and QoL improvement as dependent variables, (3) logistic regressions using aforementioned "responders/non-responders" as dependent variable. All outcomes improved significantly on follow-up. However, approximately 44% of patients were categorized as "QoL non-responders". Spearman-correlations, linear and logistic regression analyses were significant for NMSS and NMSQ but not for SCOPA-motor examination. Post-hoc, we identified specific NMS (flat moods, difficulties experiencing pleasure, pain, bladder voiding) as significant contributors to QoL outcome. Our results provide evidence that QoL improvement after STN-DBS depends on preoperative NMS characteristics. These findings are important in the advising and selection of individuals for DBS therapy. Future studies investigating motor and non-motor PD clusters may enable stratifying QoL outcomes and help predict patients' individual prospects of benefiting from DBS. Copyright © 2018. Published by Elsevier Inc.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Population response to climate change: linear vs. non-linear modeling approaches.
Ellis, Alicia M; Post, Eric
2004-03-31
Research on the ecological consequences of global climate change has elicited a growing interest in the use of time series analysis to investigate population dynamics in a changing climate. Here, we compare linear and non-linear models describing the contribution of climate to the density fluctuations of the population of wolves on Isle Royale, Michigan from 1959 to 1999. The non-linear self excitatory threshold autoregressive (SETAR) model revealed that, due to differences in the strength and nature of density dependence, relatively small and large populations may be differentially affected by future changes in climate. Both linear and non-linear models predict a decrease in the population of wolves with predicted changes in climate. Because specific predictions differed between linear and non-linear models, our study highlights the importance of using non-linear methods that allow the detection of non-linearity in the strength and nature of density dependence. Failure to adopt a non-linear approach to modelling population response to climate change, either exclusively or in addition to linear approaches, may compromise efforts to quantify ecological consequences of future warming.
Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi
2013-09-01
Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
A wavelet method for modeling and despiking motion artifacts from resting-state fMRI time series.
Patel, Ameera X; Kundu, Prantik; Rubinov, Mikail; Jones, P Simon; Vértes, Petra E; Ersche, Karen D; Suckling, John; Bullmore, Edward T
2014-07-15
The impact of in-scanner head movement on functional magnetic resonance imaging (fMRI) signals has long been established as undesirable. These effects have been traditionally corrected by methods such as linear regression of head movement parameters. However, a number of recent independent studies have demonstrated that these techniques are insufficient to remove motion confounds, and that even small movements can spuriously bias estimates of functional connectivity. Here we propose a new data-driven, spatially-adaptive, wavelet-based method for identifying, modeling, and removing non-stationary events in fMRI time series, caused by head movement, without the need for data scrubbing. This method involves the addition of just one extra step, the Wavelet Despike, in standard pre-processing pipelines. With this method, we demonstrate robust removal of a range of different motion artifacts and motion-related biases including distance-dependent connectivity artifacts, at a group and single-subject level, using a range of previously published and new diagnostic measures. The Wavelet Despike is able to accommodate the substantial spatial and temporal heterogeneity of motion artifacts and can consequently remove a range of high and low frequency artifacts from fMRI time series, that may be linearly or non-linearly related to physical movements. Our methods are demonstrated by the analysis of three cohorts of resting-state fMRI data, including two high-motion datasets: a previously published dataset on children (N=22) and a new dataset on adults with stimulant drug dependence (N=40). We conclude that there is a real risk of motion-related bias in connectivity analysis of fMRI data, but that this risk is generally manageable, by effective time series denoising strategies designed to attenuate synchronized signal transients induced by abrupt head movements. The Wavelet Despiking software described in this article is freely available for download at www.brainwavelet.org. Copyright © 2014. Published by Elsevier Inc.
A wavelet method for modeling and despiking motion artifacts from resting-state fMRI time series
Patel, Ameera X.; Kundu, Prantik; Rubinov, Mikail; Jones, P. Simon; Vértes, Petra E.; Ersche, Karen D.; Suckling, John; Bullmore, Edward T.
2014-01-01
The impact of in-scanner head movement on functional magnetic resonance imaging (fMRI) signals has long been established as undesirable. These effects have been traditionally corrected by methods such as linear regression of head movement parameters. However, a number of recent independent studies have demonstrated that these techniques are insufficient to remove motion confounds, and that even small movements can spuriously bias estimates of functional connectivity. Here we propose a new data-driven, spatially-adaptive, wavelet-based method for identifying, modeling, and removing non-stationary events in fMRI time series, caused by head movement, without the need for data scrubbing. This method involves the addition of just one extra step, the Wavelet Despike, in standard pre-processing pipelines. With this method, we demonstrate robust removal of a range of different motion artifacts and motion-related biases including distance-dependent connectivity artifacts, at a group and single-subject level, using a range of previously published and new diagnostic measures. The Wavelet Despike is able to accommodate the substantial spatial and temporal heterogeneity of motion artifacts and can consequently remove a range of high and low frequency artifacts from fMRI time series, that may be linearly or non-linearly related to physical movements. Our methods are demonstrated by the analysis of three cohorts of resting-state fMRI data, including two high-motion datasets: a previously published dataset on children (N = 22) and a new dataset on adults with stimulant drug dependence (N = 40). We conclude that there is a real risk of motion-related bias in connectivity analysis of fMRI data, but that this risk is generally manageable, by effective time series denoising strategies designed to attenuate synchronized signal transients induced by abrupt head movements. The Wavelet Despiking software described in this article is freely available for download at www.brainwavelet.org. PMID:24657353
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
NIPTmer: rapid k-mer-based software package for detection of fetal aneuploidies.
Sauk, Martin; Žilina, Olga; Kurg, Ants; Ustav, Eva-Liina; Peters, Maire; Paluoja, Priit; Roost, Anne Mari; Teder, Hindrek; Palta, Priit; Brison, Nathalie; Vermeesch, Joris R; Krjutškov, Kaarel; Salumets, Andres; Kaplinski, Lauris
2018-04-04
Non-invasive prenatal testing (NIPT) is a recent and rapidly evolving method for detecting genetic lesions, such as aneuploidies, of a fetus. However, there is a need for faster and cheaper laboratory and analysis methods to make NIPT more widely accessible. We have developed a novel software package for detection of fetal aneuploidies from next-generation low-coverage whole genome sequencing data. Our tool - NIPTmer - is based on counting pre-defined per-chromosome sets of unique k-mers from raw sequencing data, and applying linear regression model on the counts. Additionally, the filtering process used for k-mer list creation allows one to take into account the genetic variance in a specific sample, thus reducing the source of uncertainty. The processing time of one sample is less than 10 CPU-minutes on a high-end workstation. NIPTmer was validated on a cohort of 583 NIPT samples and it correctly predicted 37 non-mosaic fetal aneuploidies. NIPTmer has the potential to reduce significantly the time and complexity of NIPT post-sequencing analysis compared to mapping-based methods. For non-commercial users the software package is freely available at http://bioinfo.ut.ee/NIPTMer/ .
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
NASA Technical Reports Server (NTRS)
Ratnayake, Nalin A.; Koshimoto, Ed T.; Taylor, Brian R.
2011-01-01
The problem of parameter estimation on hybrid-wing-body type aircraft is complicated by the fact that many design candidates for such aircraft involve a large number of aero- dynamic control effectors that act in coplanar motion. This fact adds to the complexity already present in the parameter estimation problem for any aircraft with a closed-loop control system. Decorrelation of system inputs must be performed in order to ascertain individual surface derivatives with any sort of mathematical confidence. Non-standard control surface configurations, such as clamshell surfaces and drag-rudder modes, further complicate the modeling task. In this paper, asymmetric, single-surface maneuvers are used to excite multiple axes of aircraft motion simultaneously. Time history reconstructions of the moment coefficients computed by the solved regression models are then compared to each other in order to assess relative model accuracy. The reduced flight-test time required for inner surface parameter estimation using multi-axis methods was found to come at the cost of slightly reduced accuracy and statistical confidence for linear regression methods. Since the multi-axis maneuvers captured parameter estimates similar to both longitudinal and lateral-directional maneuvers combined, the number of test points required for the inner, aileron-like surfaces could in theory have been reduced by 50%. While trends were similar, however, individual parameters as estimated by a multi-axis model were typically different by an average absolute difference of roughly 15-20%, with decreased statistical significance, than those estimated by a single-axis model. The multi-axis model exhibited an increase in overall fit error of roughly 1-5% for the linear regression estimates with respect to the single-axis model, when applied to flight data designed for each, respectively.
Age and mortality after injury: is the association linear?
Friese, R S; Wynne, J; Joseph, B; Hashmi, A; Diven, C; Pandit, V; O'Keeffe, T; Zangbar, B; Kulvatunyou, N; Rhee, P
2014-10-01
Multiple studies have demonstrated a linear association between advancing age and mortality after injury. An inflection point, or an age at which outcomes begin to differ, has not been previously described. We hypothesized that the relationship between age and mortality after injury is non-linear and an inflection point exists. We performed a retrospective cohort analysis at our urban level I center from 2007 through 2009. All patients aged 65 years and older with the admission diagnosis of injury were included. Non-parametric logistic regression was used to identify the functional form between mortality and age. Multivariate logistic regression was utilized to explore the association between age and mortality. Age 65 years was used as the reference. Significance was defined as p < 0.05. A total of 1,107 patients were included in the analysis. One-third required intensive care unit (ICU) admission and 48 % had traumatic brain injury. 229 patients (20.6 %) were 84 years of age or older. The overall mortality was 7.2 %. Our model indicates that mortality is a quadratic function of age. After controlling for confounders, age is associated with mortality with a regression coefficient of 1.08 for the linear term (p = 0.02) and a regression coefficient of -0.006 for the quadratic term (p = 0.03). The model identified 84.4 years of age as the inflection point at which mortality rates begin to decline. The risk of death after injury varies linearly with age until 84 years. After 84 years of age, the mortality rates decline. These findings may reflect the varying severity of comorbidities and differences in baseline functional status in elderly trauma patients. Specifically, a proportion of our injured patient population less than 84 years old may be more frail, contributing to increased mortality after trauma, whereas a larger proportion of our injured patients over 84 years old, by virtue of reaching this advanced age, may, in fact, be less frail, contributing to less risk of death.
Dynamic decomposition of spatiotemporal neural signals
2017-01-01
Neural signals are characterized by rich temporal and spatiotemporal dynamics that reflect the organization of cortical networks. Theoretical research has shown how neural networks can operate at different dynamic ranges that correspond to specific types of information processing. Here we present a data analysis framework that uses a linearized model of these dynamic states in order to decompose the measured neural signal into a series of components that capture both rhythmic and non-rhythmic neural activity. The method is based on stochastic differential equations and Gaussian process regression. Through computer simulations and analysis of magnetoencephalographic data, we demonstrate the efficacy of the method in identifying meaningful modulations of oscillatory signals corrupted by structured temporal and spatiotemporal noise. These results suggest that the method is particularly suitable for the analysis and interpretation of complex temporal and spatiotemporal neural signals. PMID:28558039
Do non-melanoma skin cancer survivors use tanning beds less often than the general public?
Wiznia, Lauren; Dai, Feng; Chagpar, Anees B
2016-08-15
Purpose Indoor tanning is associated with an increased risk of non-melanoma skin cancers (NMSC), yet little is known about indoor tanning habits of individuals with a history of NMSC. Methods We examined self-reported history of NMSC and tanning bed use among non-Hispanic white respondents in the 2010 National Health Interview Survey (NHIS), a cross-sectional population-based survey designed to be representative of the civilian US population. We computed weighted population estimates and standard errors using the Taylor series linearization method. We then evaluated chi-square tests of independence and conducted weighted logistic regression analyses to evaluate if NMSC status was a predictor of indoor tanning. Results In our analytic sample of 14,400 non-Hispanic white participants, representing 145,287,995 in the population, 543 participants (weighted proportion = 3.45%) self-reported a history of NMSC or "skin cancer type not known." In multivariate analyses, non-melanoma skin cancer survivors were no less likely to use tanning beds in the last 12 months than skin cancer free controls (OR = 0.70, 95% CI: 0.34-1.43, p = 0.33). Conclusions Non-melanoma skin cancer survivors should be educated on their increased risk of recurrence and other skin cancers and in particular the role of indoor tanning in skin tumorigenesis.
Functional Linear Model with Zero-value Coefficient Function at Sub-regions.
Zhou, Jianhui; Wang, Nae-Yuh; Wang, Naisyin
2013-01-01
We propose a shrinkage method to estimate the coefficient function in a functional linear regression model when the value of the coefficient function is zero within certain sub-regions. Besides identifying the null region in which the coefficient function is zero, we also aim to perform estimation and inferences for the nonparametrically estimated coefficient function without over-shrinking the values. Our proposal consists of two stages. In stage one, the Dantzig selector is employed to provide initial location of the null region. In stage two, we propose a group SCAD approach to refine the estimated location of the null region and to provide the estimation and inference procedures for the coefficient function. Our considerations have certain advantages in this functional setup. One goal is to reduce the number of parameters employed in the model. With a one-stage procedure, it is needed to use a large number of knots in order to precisely identify the zero-coefficient region; however, the variation and estimation difficulties increase with the number of parameters. Owing to the additional refinement stage, we avoid this necessity and our estimator achieves superior numerical performance in practice. We show that our estimator enjoys the Oracle property; it identifies the null region with probability tending to 1, and it achieves the same asymptotic normality for the estimated coefficient function on the non-null region as the functional linear model estimator when the non-null region is known. Numerically, our refined estimator overcomes the shortcomings of the initial Dantzig estimator which tends to under-estimate the absolute scale of non-zero coefficients. The performance of the proposed method is illustrated in simulation studies. We apply the method in an analysis of data collected by the Johns Hopkins Precursors Study, where the primary interests are in estimating the strength of association between body mass index in midlife and the quality of life in physical functioning at old age, and in identifying the effective age ranges where such associations exist.
Method and Excel VBA Algorithm for Modeling Master Recession Curve Using Trigonometry Approach.
Posavec, Kristijan; Giacopetti, Marco; Materazzi, Marco; Birk, Steffen
2017-11-01
A new method was developed and implemented into an Excel Visual Basic for Applications (VBAs) algorithm utilizing trigonometry laws in an innovative way to overlap recession segments of time series and create master recession curves (MRCs). Based on a trigonometry approach, the algorithm horizontally translates succeeding recession segments of time series, placing their vertex, that is, the highest recorded value of each recession segment, directly onto the appropriate connection line defined by measurement points of a preceding recession segment. The new method and algorithm continues the development of methods and algorithms for the generation of MRC, where the first published method was based on a multiple linear/nonlinear regression model approach (Posavec et al. 2006). The newly developed trigonometry-based method was tested on real case study examples and compared with the previously published multiple linear/nonlinear regression model-based method. The results show that in some cases, that is, for some time series, the trigonometry-based method creates narrower overlaps of the recession segments, resulting in higher coefficients of determination R 2 , while in other cases the multiple linear/nonlinear regression model-based method remains superior. The Excel VBA algorithm for modeling MRC using the trigonometry approach is implemented into a spreadsheet tool (MRCTools v3.0 written by and available from Kristijan Posavec, Zagreb, Croatia) containing the previously published VBA algorithms for MRC generation and separation. All algorithms within the MRCTools v3.0 are open access and available free of charge, supporting the idea of running science on available, open, and free of charge software. © 2017, National Ground Water Association.
Smooth individual level covariates adjustment in disease mapping.
Huque, Md Hamidul; Anderson, Craig; Walton, Richard; Woolford, Samuel; Ryan, Louise
2018-05-01
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available "indiCAR" model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log-linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non-log-linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth-indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two-step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth-indiCAR through simulation. Our results indicate that the smooth-indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Datamining approaches for modeling tumor control probability.
Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D
2010-11-01
Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.
Fisz, Jacek J
2006-12-07
The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear combinations of nonlinear functions, is indicated. The VP algorithm does not distinguish the weakly nonlinear parameters from the nonlinear ones and it does not apply to the model functions which are multi-linear combinations of nonlinear functions.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
NASA Astrophysics Data System (ADS)
Leroux, Romain; Chatellier, Ludovic; David, Laurent
2018-01-01
This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.
Friedrich, Nele; Schneider, Harald J; Spielhagen, Christin; Markus, Marcello Ricardo Paulista; Haring, Robin; Grabe, Hans J; Buchfelder, Michael; Wallaschofski, Henri; Nauck, Matthias
2011-10-01
Prolactin (PRL) is involved in immune regulation and may contribute to an atherogenic phenotype. Previous results on the association of PRL with inflammatory biomarkers have been conflicting and limited by small patient studies. Therefore, we used data from a large population-based sample to assess the cross-sectional associations between serum PRL concentration and high-sensitivity C-reactive protein (hsCRP), fibrinogen, interleukin-6 (IL-6), and white blood cell (WBC) count. From the population-based Study of Health in Pomerania (SHIP), a total of 3744 subjects were available for the present analyses. PRL and inflammatory biomarkers were measured. Linear and logistic regression models adjusted for age, sex, body-mass-index, total cholesterol and glucose were analysed. Multivariable linear regression models revealed a positive association of PRL with WBC. Multivariable logistic regression analyses showed a significant association of PRL with increased IL-6 in non-smokers [highest vs lowest quintile: odds ratio 1·69 (95% confidence interval 1·10-2·58), P = 0·02] and smokers [OR 2·06 (95%-CI 1·10-3·89), P = 0·02]. Similar results were found for WBC in non-smokers [highest vs lowest quintile: OR 2·09 (95%-CI 1·21-3·61), P = 0·01)] but not in smokers. Linear and logistic regression analyses revealed no significant associations of PRL with hsCRP or fibrinogen. Serum PRL concentrations are associated with inflammatory biomarkers including IL-6 and WBC, but not hsCRP or fibrinogen. The suggested role of PRL in inflammation needs further investigation in future prospective studies. © 2011 Blackwell Publishing Ltd.
Investigating the detection of multi-homed devices independent of operating systems
2017-09-01
timestamp data was used to estimate clock skews using linear regression and linear optimization methods. Analysis revealed that detection depends on...the consistency of the estimated clock skew. Through vertical testing, it was also shown that clock skew consistency depends on the installed...optimization methods. Analysis revealed that detection depends on the consistency of the estimated clock skew. Through vertical testing, it was also
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.
Mathematical Methods in Wave Propagation: Part 2--Non-Linear Wave Front Analysis
ERIC Educational Resources Information Center
Jeffrey, Alan
1971-01-01
The paper presents applications and methods of analysis for non-linear hyperbolic partial differential equations. The paper is concluded by an account of wave front analysis as applied to the piston problem of gas dynamics. (JG)
Non-Linear Structural Dynamics Characterization using a Scanning Laser Vibrometer
NASA Technical Reports Server (NTRS)
Pai, P. F.; Lee, S.-Y.
2003-01-01
This paper presents the use of a scanning laser vibrometer and a signal decomposition method to characterize non-linear dynamics of highly flexible structures. A Polytec PI PSV-200 scanning laser vibrometer is used to measure transverse velocities of points on a structure subjected to a harmonic excitation. Velocity profiles at different times are constructed using the measured velocities, and then each velocity profile is decomposed using the first four linear mode shapes and a least-squares curve-fitting method. From the variations of the obtained modal \\ielocities with time we search for possible non-linear phenomena. A cantilevered titanium alloy beam subjected to harmonic base-excitations around the second. third, and fourth natural frequencies are examined in detail. Influences of the fixture mass. gravity. mass centers of mode shapes. and non-linearities are evaluated. Geometrically exact equations governing the planar, harmonic large-amplitude vibrations of beams are solved for operational deflection shapes using the multiple shooting method. Experimental results show the existence of 1:3 and 1:2:3 external and internal resonances. energy transfer from high-frequency modes to the first mode. and amplitude- and phase- modulation among several modes. Moreover, the existence of non-linear normal modes is found to be questionable.
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
NASA Astrophysics Data System (ADS)
Wang, Qingzhi; Tan, Guanzheng; He, Yong; Wu, Min
2017-10-01
This paper considers a stability analysis issue of piecewise non-linear systems and applies it to intermittent synchronisation of chaotic systems. First, based on piecewise Lyapunov function methods, more general and less conservative stability criteria of piecewise non-linear systems in periodic and aperiodic cases are presented, respectively. Next, intermittent synchronisation conditions of chaotic systems are derived which extend existing results. Finally, Chua's circuit is taken as an example to verify the validity of our methods.
NASA Astrophysics Data System (ADS)
Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.
2015-06-01
This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11-year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (support vector regression, neural networks) besides the multiple linear regression approach. The analysis was applied to several current reanalysis data sets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how these types of data resolve especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the tropical stratosphere were found to be in qualitative agreement with previous attribution studies, although the agreement with observational results was incomplete, especially for JRA-55. The analysis also pointed to the solar signal in the ozone data sets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. The results obtained by linear regression were confirmed by the nonlinear approach through all data sets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. The seasonal evolution of the solar response was also discussed in terms of dynamical causalities in the winter hemispheres. The hypothetical mechanism of a weaker Brewer-Dobson circulation at solar maxima was reviewed together with a discussion of polar vortex behaviour.
The weighted priors approach for combining expert opinions in logistic regression experiments
Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.
2017-04-24
When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less
The weighted priors approach for combining expert opinions in logistic regression experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.
When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less
Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun
2018-06-01
The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.; ...
2017-10-10
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
High-Dimensional Intrinsic Interpolation Using Gaussian Process Regression and Diffusion Maps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thimmisetty, Charanraj A.; Ghanem, Roger G.; White, Joshua A.
This article considers the challenging task of estimating geologic properties of interest using a suite of proxy measurements. The current work recast this task as a manifold learning problem. In this process, this article introduces a novel regression procedure for intrinsic variables constrained onto a manifold embedded in an ambient space. The procedure is meant to sharpen high-dimensional interpolation by inferring non-linear correlations from the data being interpolated. The proposed approach augments manifold learning procedures with a Gaussian process regression. It first identifies, using diffusion maps, a low-dimensional manifold embedded in an ambient high-dimensional space associated with the data. Itmore » relies on the diffusion distance associated with this construction to define a distance function with which the data model is equipped. This distance metric function is then used to compute the correlation structure of a Gaussian process that describes the statistical dependence of quantities of interest in the high-dimensional ambient space. The proposed method is applicable to arbitrarily high-dimensional data sets. Here, it is applied to subsurface characterization using a suite of well log measurements. The predictions obtained in original, principal component, and diffusion space are compared using both qualitative and quantitative metrics. Considerable improvement in the prediction of the geological structural properties is observed with the proposed method.« less
GWAS with longitudinal phenotypes: performance of approximate procedures
Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel
2015-01-01
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081
NASA Astrophysics Data System (ADS)
Choudhury, Anustup; Farrell, Suzanne; Atkins, Robin; Daly, Scott
2017-09-01
We present an approach to predict overall HDR display quality as a function of key HDR display parameters. We first performed subjective experiments on a high quality HDR display that explored five key HDR display parameters: maximum luminance, minimum luminance, color gamut, bit-depth and local contrast. Subjects rated overall quality for different combinations of these display parameters. We explored two models | a physical model solely based on physically measured display characteristics and a perceptual model that transforms physical parameters using human vision system models. For the perceptual model, we use a family of metrics based on a recently published color volume model (ICT-CP), which consists of the PQ luminance non-linearity (ST2084) and LMS-based opponent color, as well as an estimate of the display point spread function. To predict overall visual quality, we apply linear regression and machine learning techniques such as Multilayer Perceptron, RBF and SVM networks. We use RMSE and Pearson/Spearman correlation coefficients to quantify performance. We found that the perceptual model is better at predicting subjective quality than the physical model and that SVM is better at prediction than linear regression. The significance and contribution of each display parameter was investigated. In addition, we found that combined parameters such as contrast do not improve prediction. Traditional perceptual models were also evaluated and we found that models based on the PQ non-linearity performed better.
ERIC Educational Resources Information Center
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Saliba, Christopher M; Clouthier, Allison L; Brandon, Scott C E; Rainbow, Michael J; Deluzio, Kevin J
2018-05-29
Abnormal loading of the knee joint contributes to the pathogenesis of knee osteoarthritis. Gait retraining is a non-invasive intervention that aims to reduce knee loads by providing audible, visual, or haptic feedback of gait parameters. The computational expense of joint contact force prediction has limited real-time feedback to surrogate measures of the contact force, such as the knee adduction moment. We developed a method to predict knee joint contact forces using motion analysis and a statistical regression model that can be implemented in near real-time. Gait waveform variables were deconstructed using principal component analysis and a linear regression was used to predict the principal component scores of the contact force waveforms. Knee joint contact force waveforms were reconstructed using the predicted scores. We tested our method using a heterogenous population of asymptomatic controls and subjects with knee osteoarthritis. The reconstructed contact force waveforms had mean (SD) RMS differences of 0.17 (0.05) bodyweight compared to the contact forces predicted by a musculoskeletal model. Our method successfully predicted subject-specific shape features of contact force waveforms and is a potentially powerful tool in biofeedback and clinical gait analysis.
NASA Technical Reports Server (NTRS)
Wilson, Edward (Inventor)
2006-01-01
The present invention is a method for identifying unknown parameters in a system having a set of governing equations describing its behavior that cannot be put into regression form with the unknown parameters linearly represented. In this method, the vector of unknown parameters is segmented into a plurality of groups where each individual group of unknown parameters may be isolated linearly by manipulation of said equations. Multiple concurrent and independent recursive least squares identification of each said group run, treating other unknown parameters appearing in their regression equation as if they were known perfectly, with said values provided by recursive least squares estimation from the other groups, thereby enabling the use of fast, compact, efficient linear algorithms to solve problems that would otherwise require nonlinear solution approaches. This invention is presented with application to identification of mass and thruster properties for a thruster-controlled spacecraft.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Probabilistic dual heuristic programming-based adaptive critic
NASA Astrophysics Data System (ADS)
Herzallah, Randa
2010-02-01
Adaptive critic (AC) methods have common roots as generalisations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, non-linear and non-stationary environments. In this study, a novel probabilistic dual heuristic programming (DHP)-based AC controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) AC method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterised by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the probabilistic critic network is then calculated and shown to be equal to the analytically derived correct value. Full derivation of the Riccati solution for this non-standard stochastic linear quadratic control problem is also provided. Moreover, the performance of the proposed probabilistic controller is demonstrated on linear and non-linear control examples.
NASA Astrophysics Data System (ADS)
Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea
2016-12-01
The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.
Effects of acceleration in the Gz axis on human cardiopulmonary responses to exercise.
Bonjour, Julien; Bringard, Aurélien; Antonutto, Guglielmo; Capelli, Carlo; Linnarsson, Dag; Pendergast, David R; Ferretti, Guido
2011-12-01
The aim of this paper was to develop a model from experimental data allowing a prediction of the cardiopulmonary responses to steady-state submaximal exercise in varying gravitational environments, with acceleration in the G(z) axis (a (g)) ranging from 0 to 3 g. To this aim, we combined data from three different experiments, carried out at Buffalo, at Stockholm and inside the Mir Station. Oxygen consumption, as expected, increased linearly with a (g). In contrast, heart rate increased non-linearly with a (g), whereas stroke volume decreased non-linearly: both were described by quadratic functions. Thus, the relationship between cardiac output and a (g) was described by a fourth power regression equation. Mean arterial pressure increased with a (g) non linearly, a relation that we interpolated again with a quadratic function. Thus, total peripheral resistance varied linearly with a (g). These data led to predict that maximal oxygen consumption would decrease drastically as a (g) is increased. Maximal oxygen consumption would become equal to resting oxygen consumption when a (g) is around 4.5 g, thus indicating the practical impossibility for humans to stay and work on the biggest Planets of the Solar System.
Vitello, Dominic J; Ripper, Richard M; Fettiplace, Michael R; Weinberg, Guy L; Vitello, Joseph M
2015-01-01
Purpose. The gravimetric method of weighing surgical sponges is used to quantify intraoperative blood loss. The dry mass minus the wet mass of the gauze equals the volume of blood lost. This method assumes that the density of blood is equivalent to water (1 gm/mL). This study's purpose was to validate the assumption that the density of blood is equivalent to water and to correlate density with hematocrit. Methods. 50 µL of whole blood was weighed from eighteen rats. A distilled water control was weighed for each blood sample. The averages of the blood and water were compared utilizing a Student's unpaired, one-tailed t-test. The masses of the blood samples and the hematocrits were compared using a linear regression. Results. The average mass of the eighteen blood samples was 0.0489 g and that of the distilled water controls was 0.0492 g. The t-test showed P = 0.2269 and R (2) = 0.03154. The hematocrit values ranged from 24% to 48%. The linear regression R (2) value was 0.1767. Conclusions. The R (2) value comparing the blood and distilled water masses suggests high correlation between the two populations. Linear regression showed the hematocrit was not proportional to the mass of the blood. The study confirmed that the measured density of blood is similar to water.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Scale of association: hierarchical linear models and the measurement of ecological systems
Sean M. McMahon; Jeffrey M. Diez
2007-01-01
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...
Acoustic-articulatory mapping in vowels by locally weighted regression
McGowan, Richard S.; Berger, Michael A.
2009-01-01
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented. PMID:19813812
Kapogiannis, Bill G.; Leister, Erin; Siberry, George K.; Van Dyke, Russell B.; Rudy, Bret; Flynn, Patricia; Williams, Paige L.
2015-01-01
Objective To longitudinally characterize non-invasive markers of liver disease in HIV-infected youth. Design HIV infection, without viral hepatitis co-infection, may contribute to liver disease. Non-invasive markers of liver disease [FIB-4 (Fibrosis-4) and APRI (aspartate aminotransferase-to-platelet ratio index)] have been evaluated in adults with concomitant HIV and hepatitis C, but are less studied in children. Methods In prospective cohorts of HIV-infected and HIV-uninfected youth, we used linear regression models to compare log-transformed FIB-4 and APRI measures by HIV status based on a single visit at ages 15–20 years. We also longitudinally modeled trends in these measures in HIV-infected youth with ≥2 visits to compare those with behavioral vs perinatal HIV infection (PHIV) using mixed effect linear regression, adjusting for age, gender, body mass index, and race/ethnicity. Results Of 1785 participants, 41% were male, 57% black non-Hispanic and 27% Hispanic. More HIV-infected than uninfected youth had an APRI score >0.5 (13% vs 3%, p<0.001). Among 1307 HIV-infected participants with longitudinal measures, FIB-4 scores increased 6% per year (p<0.001) among all HIV-infected youth, whereas APRI scores increased 2% per year (p=0.007) only among PHIV youth. The incidence rates (95% CI) of progression of APRI to >0.5 and >1.5 were 7.5 (6.5–8.7) and 1.4 (1.0–1.9) cases per 100 person-years of follow up, respectively. The incidence of progression of FIB-4 to >1.5 and >3.25 were 1.6 (1.2–2.2) and 0.3 (0.2–0.6) cases per 100 person-years, respectively. Conclusions APRI and FIB-4 scores were higher among HIV-infected youth. Progression to scores suggesting subclinical fibrosis or worse was common. PMID:26959353
NASA Technical Reports Server (NTRS)
Stolzer, Alan J.; Halford, Carl
2007-01-01
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Lundblad, Runar; Abdelnoor, Michel; Svennevig, Jan Ludvig
2004-09-01
Simple linear resection and endoventricular patch plasty are alternative techniques to repair postinfarction left ventricular aneurysm. The aim of the study was to compare these 2 methods with regard to early mortality and long-term survival. We retrospectively reviewed 159 patients undergoing operations between 1989 and 2003. The epidemiologic design was of an exposed (simple linear repair, n = 74) versus nonexposed (endoventricular patch plasty, n = 85) cohort with 2 endpoints: early mortality and long-term survival. The crude effect of aneurysm repair technique versus endpoint was estimated by odds ratio, rate ratio, or relative risk and their 95% confidence intervals. Stratification analysis by using the Mantel-Haenszel method was done to quantify confounders and pinpoint effect modifiers. Adjustment for multiconfounders was performed by using logistic regression and Cox regression analysis. Survival curves were analyzed with the Breslow test and the log-rank test. Early mortality was 8.2% for all patients, 13.5% after linear repair and 3.5% after endoventricular patch plasty. When adjusted for multiconfounders, the risk of early mortality was significantly higher after simple linear repair than after endoventricular patch plasty (odds ratio, 4.4; 95% confidence interval, 1.1-17.8). Mean follow-up was 5.8 +/- 3.8 years (range, 0-14.0 years). Overall 5-year cumulative survival was 78%, 70.1% after linear repair and 91.4% after endoventricular patch plasty. The risk of total mortality was significantly higher after linear repair than after endoventricular patch plasty when controlled for multiconfounders (relative risk, 4.5; 95% confidence interval, 2.0-9.7). Linear repair dominated early in the series and patch plasty dominated later, giving a possible learning-curve bias in favor of patch plasty that could not be adjusted for in the regression analysis. Postinfarction left ventricular aneurysm can be repaired with satisfactory early and late results. Surgical risk was lower and long-term survival was higher after endoventricular patch plasty than simple linear repair. Differences in outcome should be interpreted with care because of the retrospective study design and the chronology of the 2 repair methods.
Valuation of financial models with non-linear state spaces
NASA Astrophysics Data System (ADS)
Webber, Nick
2001-02-01
A common assumption in valuation models for derivative securities is that the underlying state variables take values in a linear state space. We discuss numerical implementation issues in an interest rate model with a simple non-linear state space, formulating and comparing Monte Carlo, finite difference and lattice numerical solution methods. We conclude that, at least in low dimensional spaces, non-linear interest rate models may be viable.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less
A pocket-sized metabolic analyzer for assessment of resting energy expenditure.
Zhao, Di; Xian, Xiaojun; Terrera, Mirna; Krishnan, Ranganath; Miller, Dylan; Bridgeman, Devon; Tao, Kevin; Zhang, Lihua; Tsow, Francis; Forzani, Erica S; Tao, Nongjian
2014-04-01
The assessment of metabolic parameters related to energy expenditure has a proven value for weight management; however these measurements remain too difficult and costly for monitoring individuals at home. The objective of this study is to evaluate the accuracy of a new pocket-sized metabolic analyzer device for assessing energy expenditure at rest (REE) and during sedentary activities (EE). The new device performs indirect calorimetry by measuring an individual's oxygen consumption (VO2) and carbon dioxide production (VCO2) rates, which allows the determination of resting- and sedentary activity-related energy expenditure. VO2 and VCO2 values of 17 volunteer adult subjects were measured during resting and sedentary activities in order to compare the metabolic analyzer with the Douglas bag method. The Douglas bag method is considered the Gold Standard method for indirect calorimetry. Metabolic parameters of VO2, VCO2, and energy expenditure were compared using linear regression analysis, paired t-tests, and Bland-Altman plots. Linear regression analysis of measured VO2 and VCO2 values, as well as calculated energy expenditure assessed with the new analyzer and Douglas bag method, had the following linear regression parameters (linear regression slope LRS0, and R-squared coefficient, r(2)) with p = 0: LRS0 (SD) = 1.00 (0.01), r(2) = 0.9933 for VO2; LRS0 (SD) = 1.00 (0.01), r(2) = 0.9929 for VCO2; and LRS0 (SD) = 1.00 (0.01), r(2) = 0.9942 for energy expenditure. In addition, results from paired t-tests did not show statistical significant difference between the methods with a significance level of α = 0.05 for VO2, VCO2, REE, and EE. Furthermore, the Bland-Altman plot for REE showed good agreement between methods with 100% of the results within ±2SD, which was equivalent to ≤10% error. The findings demonstrate that the new pocket-sized metabolic analyzer device is accurate for determining VO2, VCO2, and energy expenditure. Copyright © 2013 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Device and method for generating a beam of acoustic energy from a borehole, and applications thereof
Vu, Cung Khac; Sinha, Dipen N.; Pantea, Cristian; Nihei, Kurt T.; Schmitt, Denis P.; Skelt, Chirstopher
2013-10-15
In some aspects of the invention, a method of generating a beam of acoustic energy in a borehole is disclosed. The method includes generating a first acoustic wave at a first frequency; generating a second acoustic wave at a second frequency different than the first frequency, wherein the first acoustic wave and second acoustic wave are generated by at least one transducer carried by a tool located within the borehole; transmitting the first and the second acoustic waves into an acoustically non-linear medium, wherein the composition of the non-linear medium produces a collimated beam by a non-linear mixing of the first and second acoustic waves, wherein the collimated beam has a frequency based upon a difference between the first frequency range and the second frequency, and wherein the non-linear medium has a velocity of sound between 100 m/s and 800 m/s.
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Verification of spectrophotometric method for nitrate analysis in water samples
NASA Astrophysics Data System (ADS)
Kurniawati, Puji; Gusrianti, Reny; Dwisiwi, Bledug Bernanti; Purbaningtias, Tri Esti; Wiyantoko, Bayu
2017-12-01
The aim of this research was to verify the spectrophotometric method to analyze nitrate in water samples using APHA 2012 Section 4500 NO3-B method. The verification parameters used were: linearity, method detection limit, level of quantitation, level of linearity, accuracy and precision. Linearity was obtained by using 0 to 50 mg/L nitrate standard solution and the correlation coefficient of standard calibration linear regression equation was 0.9981. The method detection limit (MDL) was defined as 0,1294 mg/L and limit of quantitation (LOQ) was 0,4117 mg/L. The result of a level of linearity (LOL) was 50 mg/L and nitrate concentration 10 to 50 mg/L was linear with a level of confidence was 99%. The accuracy was determined through recovery value was 109.1907%. The precision value was observed using % relative standard deviation (%RSD) from repeatability and its result was 1.0886%. The tested performance criteria showed that the methodology was verified under the laboratory conditions.
Cer, Regina Z; Herrera-Galeano, J Enrique; Anderson, Joseph J; Bishop-Lilly, Kimberly A; Mokashi, Vishwesh P
2014-01-01
Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one's own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts. We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods. mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.
Degradation modeling of mid-power white-light LEDs by using Wiener process.
Huang, Jianlin; Golubović, Dušan S; Koh, Sau; Yang, Daoguo; Li, Xiupeng; Fan, Xuejun; Zhang, G Q
2015-07-27
The IES standard TM-21-11 provides a guideline for lifetime prediction of LED devices. As it uses average normalized lumen maintenance data and performs non-linear regression for lifetime modeling, it cannot capture dynamic and random variation of the degradation process of LED devices. In addition, this method cannot capture the failure distribution, although it is much more relevant in reliability analysis. Furthermore, the TM-21-11 only considers lumen maintenance for lifetime prediction. Color shift, as another important performance characteristic of LED devices, may also render significant degradation during service life, even though the lumen maintenance has not reached the critical threshold. In this study, a modified Wiener process has been employed for the modeling of the degradation of LED devices. By using this method, dynamic and random variations, as well as the non-linear degradation behavior of LED devices, can be easily accounted for. With a mild assumption, the parameter estimation accuracy has been improved by including more information into the likelihood function while neglecting the dependency between the random variables. As a consequence, the mean time to failure (MTTF) has been obtained and shows comparable result with IES TM-21-11 predictions, indicating the feasibility of the proposed method. Finally, the cumulative failure distribution was presented corresponding to different combinations of lumen maintenance and color shift. The results demonstrate that a joint failure distribution of LED devices could be modeled by simply considering their lumen maintenance and color shift as two independent variables.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
ERIC Educational Resources Information Center
Hester, Yvette
Least squares methods are sophisticated mathematical curve fitting procedures used in all classical parametric methods. The linear least squares approximation is most often associated with finding the "line of best fit" or the regression line. Since all statistical analyses are correlational and all classical parametric methods are least…
New analysis methods to push the boundaries of diagnostic techniques in the environmental sciences
NASA Astrophysics Data System (ADS)
Lungaroni, M.; Murari, A.; Peluso, E.; Gelfusa, M.; Malizia, A.; Vega, J.; Talebzadeh, S.; Gaudio, P.
2016-04-01
In the last years, new and more sophisticated measurements have been at the basis of the major progress in various disciplines related to the environment, such as remote sensing and thermonuclear fusion. To maximize the effectiveness of the measurements, new data analysis techniques are required. First data processing tasks, such as filtering and fitting, are of primary importance, since they can have a strong influence on the rest of the analysis. Even if Support Vector Regression is a method devised and refined at the end of the 90s, a systematic comparison with more traditional non parametric regression methods has never been reported. In this paper, a series of systematic tests is described, which indicates how SVR is a very competitive method of non-parametric regression that can usefully complement and often outperform more consolidated approaches. The performance of Support Vector Regression as a method of filtering is investigated first, comparing it with the most popular alternative techniques. Then Support Vector Regression is applied to the problem of non-parametric regression to analyse Lidar surveys for the environments measurement of particulate matter due to wildfires. The proposed approach has given very positive results and provides new perspectives to the interpretation of the data.
Nonlinear multivariate and time series analysis by neural network methods
NASA Astrophysics Data System (ADS)
Hsieh, William W.
2004-03-01
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
NASA Technical Reports Server (NTRS)
Ottander, John A.; Hall, Robert A.; Powers, J. F.
2018-01-01
A method is presented that allows for the prediction of the magnitude of limit cycles due to adverse control-slosh interaction in liquid propelled space vehicles using non-linear slosh damping. Such a method is an alternative to the industry practice of assuming linear damping and relying on: mechanical slosh baffles to achieve desired stability margins; accepting minimal slosh stability margins; or time domain non-linear analysis to accept time periods of poor stability. Sinusoidal input describing functional analysis is used to develop a relationship between the non-linear slosh damping and an equivalent linear damping at a given slosh amplitude. In addition, a more accurate analytical prediction of the danger zone for slosh mass locations in a vehicle under proportional and derivative attitude control is presented. This method is used in the control-slosh stability analysis of the NASA Space Launch System.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction
NASA Astrophysics Data System (ADS)
Pan, Bingyue
2018-02-01
In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Chaurasia, Ashok; Harel, Ofer
2015-02-10
Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Relationship between Gender Roles and Sexual Assertiveness in Married Women
Azmoude, Elham; Firoozi, Mahbobe; Sadeghi Sahebzad, Elahe; Asgharipour, Neghar
2016-01-01
ABSTRACT Background: Evidence indicates that sexual assertiveness is one of the important factors affecting sexual satisfaction. According to some studies, traditional gender norms conflict with women’s capability in expressing sexual desires. This study examined the relationship between gender roles and sexual assertiveness in married women in Mashhad, Iran. Methods: This cross-sectional study was conducted on 120 women who referred to Mashhad health centers through convenient sampling in 2014-15. Data were collected using Bem Sex Role Inventory (BSRI) and Hulbert index of sexual assertiveness. Data were analyzed using SPSS 16 by Pearson and Spearman’s correlation tests and linear Regression Analysis. Results: The mean scores of sexual assertiveness was 54.93±13.20. According to the findings, there was non-significant correlation between Femininity and masculinity score with sexual assertiveness (P=0.069 and P=0.080 respectively). Linear regression analysis indicated that among the predictor variables, only Sexual function satisfaction was identified as the sexual assertiveness summary predictor variables (P=0.001). Conclusion: Based on the results, sexual assertiveness in married women does not comply with gender role, but it is related to Sexual function satisfaction. So, counseling psychologists need to consider this variable when designing intervention programs for modifying sexual assertiveness and find other variables that affect sexual assertiveness. PMID:27713899
Coping Styles in Heart Failure Patients with Depressive Symptoms
Trivedi, Ranak B.; Blumenthal, James A.; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Sueta-Dupree, Carla; Johnson, Kristy; Sherwood, Andrew
2009-01-01
Objective Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. Methods 222 stable HF patients (32.75% female, 45.4% non-Hispanic Black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 versus BDI≥10, to identify coping styles accompanying clinically significant depressive symptoms. Results In linear regression models, higher BDI scores were associated with lower scores on the acceptance (β=-.14), humor (β=-.15), planning (β=-.15), and emotional support (β=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (β=.41), denial (β=.33), venting (β=.25), and mental disengagement (β=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (β=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (β=.39, p<.001). In logistical regression models, BDI≥10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Conclusion Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms. PMID:19773027
NASA Astrophysics Data System (ADS)
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Hassan, A K
2015-01-01
In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.